Relation extracting method based on convolution neural network and distance supervision

A convolutional neural network, relation extraction technology, applied in natural language processing, information extraction and relation extraction, neural network field, can solve the information extraction task of unsupervised method, accuracy decline, supervised method is not suitable for large-scale open field And other issues

Active Publication Date: 2016-10-26
杭州量知数据科技有限公司
View PDF3 Cites 25 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Since the supervised method needs to manually mark the training data set in advance, and this work is relatively labor-intensive, the supervised method is not suitable for large-scale open domain information extraction tasks.
[0005] Second, unsupervised methods
However, after a large number of iterations, the accuracy rate usually drops more, which is caused by the accumulation of labeling errors. This phenomenon is called the "semantic drift" problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Relation extracting method based on convolution neural network and distance supervision
  • Relation extracting method based on convolution neural network and distance supervision
  • Relation extracting method based on convolution neural network and distance supervision

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0089] Take about 820,000 entries of Wikipedia and a large amount of corpus of New York Times to complete the relationship extraction task of KBP2010 as an example, the implementation steps of the present invention are as follows:

[0090] illustrate:

[0091] There is an entry on Wikipedia, which corresponds to an entity and its related attributes. In the Info Box of each entry, there are also articles related to this entry, that is, the text content. The New York Times Corpus is a large number of news texts from the New York Times, which contains a large amount of unstructured information.

[0092] 1. Map the information of the Info Box on Wikipedia to the attribute type corresponding to KBP. For example, map the University:established relationship to the Org:founded target attribute. Map some attributes on the wiki that are not in the task, just ignore these attributes, and there are also one-to-many, corresponding mappings;

[0093] 2. Find the entity alias correspondin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a relation extracting method based on a convolution neural network and distance supervision. The method comprises the steps of (1), mapping existing few relations into required relation types; (2), expanding different expression ways of entities in the existing relations; (3), obtaining a great number of relevant unstructured texts from the internet, and establishing indexes; (4), querying sentences relevant to the entities through the indexes, and classifying the sentences into positive samples and negative samples; (5), converting the samples into feature vectors based on the convolution neural network; and (6), classifying the texts by employing the obtained feature vectors, thereby obtaining new relation pairs. According to the method, on the basis of an assumption that one sentence may have one relation, starting from few known relations, a great deal of new structured information is obtained by employing a great number of unstructured texts from the internet, and namely, the new relations are discovered.

Description

technical field [0001] The invention relates to neural network, natural language processing, information extraction and relation extraction, in particular to a relation extraction method based on convolutional neural network and distance supervision. Background technique [0002] In recent years, with the rapid development of the Internet, there are more and more content and knowledge on the Internet, even exponentially doubling, including news, blogs, emails, government documents, chat records and so on. However, these data are all unstructured electronic texts. How can humans easily understand all this data? A very good idea is to convert these unstructured data into structured semantic information. However, the huge amount of data makes it very difficult or even impossible to manually annotate this information. Therefore, it is hoped that these data can be marked into a text structure that is easy for humans to understand and read through computers and computer technol...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/30G06F17/27G06N3/02
CPCG06F16/9535G06F40/211G06N3/02
Inventor 凌立刚朱海鹏
Owner 杭州量知数据科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products