Open Chinese entity relation extraction method using dependency analysis

A dependency analysis and entity relationship technology, applied in the field of natural language information extraction, can solve problems such as complex grammar and achieve the effect of ensuring accuracy

Active Publication Date: 2015-09-23
上海兑观信息科技技术有限公司
View PDF7 Cites 55 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, due to the inherent limitations of Chinese grammar, complex expressions, and rich semantics, it is difficult for some English entity relationship extraction methods to be directly used in Chinese entity relationship extraction.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Open Chinese entity relation extraction method using dependency analysis
  • Open Chinese entity relation extraction method using dependency analysis
  • Open Chinese entity relation extraction method using dependency analysis

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0049] refer to Figure 1-4 , input large-scale free text, and then preprocess the input free text.

[0050] Step 1: Since the free text contains a large number of HTML tags and other noises, the text is extracted using the Vision-based Page Segmentation (VIPS) algorithm for the input free text;

[0051] Step 2: Sentence-processing the output text according to period, question mark and exclamation mark, and output a set of single sentences;

[0052] Step 3: Since Chinese is not like English, there are spaces between words as an obvious segmentation boundary. With the help of the Language Technology Platform Cloud (Language Technology Platform, LTP) of the Social Computing and Information Retrieval Research Center of Harbin Institute of Technology, each single sentence Perform word segmentation, part-of-speech tagging, named entity recognition, and dependency analysis. For example, using LTP to analyze the sentence "Shanghai Municipal Public Security Bureau and Shanghai Custo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an open Chinese entity relation extraction method using dependency analysis. According to the method, firstly, sentences are subjected to dependency analysis; then, a Chinese grammar heuristic rule and the dependency analysis result are combined for extracting relation words; next, the named entity position is determined according to the distance; and finally, the triple output is carried out. The experiment is carried out on SogouCA and SogouCS language databases. The result shows that the method provided by the invention is applicable to large-scale language databases, and has good transportability. The method provided by the invention fundamentally overcomes the limitation of intrinsic properties of complicated Chinese grammar, diverse expression modes, rich semantics and the like.

Description

technical field [0001] The invention relates to the technical field of natural language information extraction, in particular to an open Chinese entity relationship extraction method using dependency analysis. Background technique [0002] In recent years, with the development of Internet technology, the World Wide Web has gradually become an inexhaustible source of information. How to quickly obtain the information that users are interested in has become the focus of research. It is against this background that Information Extraction (IE) technology emerged. The main purpose of information extraction is to extract specified entities (Entity), relation (Relation), event (Event) and other facts from natural language texts. Information, transforming unstructured information in text into structured information. Entity relationship extraction (Relation Extraction, RE) refers to determining whether there is a certain semantic relationship between entities. It is an important par...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/27
Inventor 杨静李明耀贺樑
Owner 上海兑观信息科技技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products