A method and device for extracting entity attribute information based on syntactic dependency

A technology of entity attributes and attribute information, which is applied in the fields of instruments, computing, and electrical digital data processing, etc., can solve the problems of misalignment of attributes in information extraction methods, and achieve the effects of reducing workload, improving efficiency, and improving accuracy

Active Publication Date: 2021-06-01
湖南星汉数智科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The purpose of the invention: In order to solve the technical problem of attribute misalignment in the existing information extraction method based on natural language processing, provide a method and device for extracting entity attribute information based on syntactic dependence, combine natural language processing with graph theory, use The syntactic dependency tree in the natural language processing results creates an undirected weighted graph, and uses the shortest path algorithm in graph theory to search for the shortest associated path between entities and associated information, and calculates the semantic similarity between words and attribute keywords on the path , automatically align the attributes of entities and associated information

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for extracting entity attribute information based on syntactic dependency
  • A method and device for extracting entity attribute information based on syntactic dependency
  • A method and device for extracting entity attribute information based on syntactic dependency

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0055] refer to Figure 1-2 , taking the text "Deng Chao, born in Nanchang, Jiangxi Province in 1979, and admitted to the Performance Department of the Central Academy of Drama in 1998." as an example, the method of extracting entity attribute information based on the syntax-dependent path is explained in detail:

[0056] Step 1: According to the keyword request entered by the user, the text to be extracted is obtained from the Internet with the help of existing crawler software, and the text to be extracted is preprocessed to obtain the text entity to be extracted;

[0057] Step 1.1: Record the text to be extracted as "Deng Chao, born in Nanchang, Jiangxi Province in 1979, and was admitted to the Performance Department of the Central Academy of Drama in 1998." as I, use the HanLP open source tool to segment the text I, and obtain the word set after word segmentation, denoted as W;

[0058] Step 1.2: Use the HanLP open source tool to perform part-of-speech tagging and named e...

Embodiment 2

[0081] Now take the text "Yuan Hong, graduated from the Shanghai Theater Academy, and is Hu Ge's classmate and friend." as an example, to describe in detail the method of extracting entity-related information based on the syntax-dependent path:

[0082] Step 1: Preprocess the text to be extracted to obtain the text entity to be extracted;

[0083] Step 1.1: Record the text to be extracted as "Yuan Hong, graduated from the Shanghai Theater Academy, and is a classmate of Hu Ge." as I, use the Stanford open source NLP tool to process the text I, and obtain the word set after text segmentation, which is recorded as W, the set of words such as image 3 As shown, NN represents a common noun, PU represents a sentence break, VV represents a verb, NR represents a proper noun, VC represents yes, and DEG represents an auxiliary word;

[0084] Step 1.2: Use the Stanford open source NLP tool to perform part-of-speech tagging and named entity recognition on the word set. The obtained word ...

Embodiment 3

[0109] refer to Figure 5 , the present invention also discloses a device for extracting entity related information based on a syntax-dependent path, including:

[0110] The preprocessing module is used to obtain the text to be extracted from the Internet by means of the existing crawler software according to the keyword request input by the user, and preprocess the text to be extracted to obtain the text entity to be extracted;

[0111] The path calculation module is used to establish an undirected weighted graph between words according to the syntactic dependence and part-of-speech relationship of the text to be extracted, and obtain the candidate attribute information of the text entity to be extracted according to the part-of-speech relationship; search in the undirected weighted graph The shortest path between the text entity to be extracted and the words of the candidate attribute information, and the words passing through the shortest path form a set of associated infor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method and device for extracting entity attribute information based on syntactic dependence. The method firstly performs preprocessing on the text to be extracted to obtain the text entity to be extracted; then, according to the syntactic dependence and part of speech relationship of the text to be extracted, establishes the According to the undirected weighted graph, the candidate attribute information of the text entity to be extracted is obtained according to the part-of-speech relationship; the shortest path between the text entity to be extracted and the words of the candidate attribute information is searched, and the words passed on the shortest path are formed into a set of associated information words; Finally, the semantic similarity between each attribute in the attribute set and the associated information word set is calculated to obtain the entity attribute, and the entity, entity attribute and attribute information are integrated as the final extraction result. The invention combines the natural language processing technology and the graph theory model to solve the ambiguity of text information and improve the accuracy of text extraction; the semantic similarity of keywords is used to automatically summarize the attributes of abstract information and the extraction efficiency is improved.

Description

technical field [0001] The invention relates to the technical field of natural language processing, in particular to a method and device for extracting entity attribute information based on syntactic dependence. Background technique [0002] With the rapid development of Internet applications, the number of web pages and texts on the Internet is also increasing exponentially. How to extract effective and practical information from these massive web pages and texts has become a hot research and development topic in the industry and academia. . At present, information extraction based on structured text has made great progress and has been widely used. However, due to the complex and changeable presentation forms of unstructured free text, as well as the diversity and ambiguity of text semantics, coupled with the existence of a large number of invalid and interfering text pictures and other information in the text, the information extraction of free text is further increased....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/279
CPCG06F40/279
Inventor 郭建京彭建辉
Owner 湖南星汉数智科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products