Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Extracting method and device of complex named entity

A named entity, complex technology, applied in the field of information extraction, can solve problems such as low accuracy, and achieve the effect of improving accuracy

Inactive Publication Date: 2013-10-23
SHENZHEN SHI JI GUANG SU INFORMATION TECH
View PDF0 Cites 10 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The embodiment of the present invention provides a method for extracting complex named entities, aiming to solve the problem of low accuracy in existing complex named entity extraction

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Extracting method and device of complex named entity
  • Extracting method and device of complex named entity
  • Extracting method and device of complex named entity

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0034] figure 1 The flow of the complex named entity extraction method provided by the first embodiment of the present invention is shown, and the details are as follows:

[0035] In step S11, the text data of the text is filtered, and the filtered text data is connected into a long string through a designated connector, wherein the text data before filtering includes Chinese characters and English characters of the text.

[0036] In this embodiment, the text that needs to extract complex named entities is obtained, and the text data of the text is filtered, wherein the text data before filtering includes Chinese characters and English characters of the text.

[0037] In this embodiment, the step of filtering the text data of the text specifically includes: filtering unrecognizable Chinese characters, English characters, and punctuation marks in the text; Chinese characters, English characters and punctuation marks. This embodiment mainly filters out unrecognizable Chinese c...

Embodiment 2

[0057] figure 2 The structure of the device for extracting complex named entities provided by the second embodiment of the present invention is shown. For convenience of description, only the parts related to the embodiment of the present invention are shown.

[0058] The device for extracting complex named entities can be used for various information processing terminals connected to servers through wired or wireless networks, such as mobile phones, pocket computers (Pocket Personal Computer, PPC), palmtop computers, computers, notebook computers, personal digital assistants (Personal Digital Assistants). Digital Assistant, PDA), etc., can be a software unit, a hardware unit, or a combination of software and hardware running in these terminals, and can also be integrated into these terminals as an independent pendant or run in the application system of these terminals. :

[0059] The text data connection unit 21 is used to filter the text data of the text, and connect the f...

Embodiment 3

[0069] image 3 The structure of another device for extracting complex named entities provided by the third embodiment of the present invention is shown. For the convenience of description, only the parts related to the embodiment of the present invention are shown.

[0070] The complex named entity extraction device includes: a text data connection unit 21 , an ordered sequence acquisition unit 22 , a repeated string acquisition unit 23 and a complex named entity extraction unit 24 .

[0071] Optionally, the text data connection unit 21 includes: a data filtering module and a substring connection module.

[0072] The data filtering module is used to filter unrecognizable Chinese characters, English characters and punctuation marks in the text, and / or filter Chinese characters, English characters and punctuation marks that are recognizable in the text but have a usage frequency lower than a preset usage frequency.

[0073] The substring connection module is used to connect th...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is applicable to the field of information extraction and provides an extracting method and a device of a complex named entity. The method comprises the steps of filtering text data of a text, connecting substrings separated by punctuations in the filtered text data into a long string through specified connectors, recording a home position of a Chinese character or an English character of the long string, storing the recorded home position of the Chinese character or the English character into an established suffix data set to determine an ordered sequence of suffixes in the suffix data set, determining a longest common prefix of the adjacent suffixes according to the ordered sequence of the suffixes, taking the determined longest common prefix of the adjacent suffixes as a repeated string of the text, and extracting the complex named entity of the text according to at least two out of the frequency of the repeated string of the text, mutual information of the repeated string and the independence of the repeated string. With the adoption of the method and the device, the more accurate complex named entity can be obtained, and the extracting accuracy of the complex named entity is improved.

Description

technical field [0001] The invention belongs to the field of information extraction, in particular to a complex named entity extraction method and device. Background technique [0002] With the development of network technology, video sharing websites such as Qiqi HD and Tudou have developed rapidly at home and abroad. How to accurately and effectively extract text information from video pages or other Web pages has become an important issue in the field of information extraction. [0003] Video pages or other web pages contain a large amount of text information, such as actor names, TV drama names, hot event names, etc. If these information can be extracted from video pages or other web pages, the speed of users searching for information will be greatly improved. The common characteristics of the above-mentioned actor names, TV drama names, and hot event names are: the words are long and do not contain these names in ordinary dictionaries. The above-mentioned names are call...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 蒋喻新辛国茂
Owner SHENZHEN SHI JI GUANG SU INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products