A remote-supervised dual-attention relationship classification method and system

A technology of relational classification and remote supervision, applied in text database clustering/classification, unstructured text data retrieval, instruments, etc., can solve the problems of model training noise influence, weak performance, high cost ratio, etc., to reduce noise data, Avoid mistransmission and improve accuracy

Active Publication Date: 2020-10-02
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
View PDF8 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The current mainstream relationship extraction method is a relationship classification method based on neural network learning, which mainly faces three major problems: difficulties in the representation and mining of semantic features, error transmission caused by manual labeling, and noise impact of model training
Although some improved convolutional network models can achieve modeling of larger span information by superimposing structures such as K-segment maximum pooling, such as the experiment of three-segment pooling through PCNNs (Piecewise CNNs), the maximum pooling method is relatively For Bi-LSTM, when extracting semantic features with long dependencies such as long texts, the cost is relatively high and the performance is relatively weak

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A remote-supervised dual-attention relationship classification method and system
  • A remote-supervised dual-attention relationship classification method and system
  • A remote-supervised dual-attention relationship classification method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0054] like figure 1 As shown, an embodiment of the present invention provides a remotely supervised Dual-Attention relationship classification method, including:

[0055] Align entity pairs in the knowledge base to the news corpus through remote supervision to construct entity pair sentence sets;

[0056] The Bi-LSTM model based on the word-level attention mechanism performs the word-level vector encoding on the sentence to obtain the semantic feature encoding vector of the sentence;

[0057] The Bi-LSTM model based on the sentence-level attention mechanism encodes and denoises the semantic features of the sentence, and obtains the feature encoding vector of the sentence set;

[0058] The sentence set feature encoding vector and the entity pair translation vector are packaged, and the obtained package features are classified into entity pairs.

[0059] like figure 2 Shown is a specific flow chart of an embodiment of the present invention.

[0060] Preferably, by aligning...

Embodiment 2

[0081] Based on the same inventive concept, the present invention also provides a remotely supervised Dual-Attention relation classification system, including:

[0082] Building module for aligning entity pairs in the knowledge base to news corpus through remote supervision, and constructing entity pair sentence sets;

[0083] The first vector module is used to perform word-level vector encoding on the sentence based on the Bi-LSTM model of the word-level attention mechanism to obtain a semantic feature encoding vector of the sentence;

[0084] The second vector module is used for encoding and denoising the semantic features of the sentence based on the Bi-LSTM model of the sentence-level attention mechanism to obtain a sentence set feature encoding vector;

[0085] The relationship classification module is used to package the sentence set feature encoding vector and the entity pair translation vector, and perform the relationship classification of the entity pair on the obtai...

Embodiment 3

[0105] In the knowledge base WikiData, the entity pairs "Jack Ma", "Alibaba" and the corresponding relationship sets "founder", "CEO" and other relationships are known, and the Internet data is classified into the entity pairs "Jack Ma" and "Alibaba". Several sentences, here are examples of sentences in which four entities co-occur.

[0106] Sentence 1: "Female executives are Alibaba's secret sauce, founder Jack Masays."

[0107] Sentence 2: "At a conference hosted by All Things D last week, Alibaba CEO Jack Ma said that he was interested in Yahoo."

[0108] Sentence 3: "Internet entrepreneur Jack Ma started a Chinese version of the Yellow Pages that was Alibaba's precursor in Hanzhou, China."

[0109] Sentence 4: "Alibaba has brought more small U.S. businesses onto the company's sites, but this is the first time Ma has discussed specific targets."

[0110] Sentences 3 and 4 do not express the predefined relationship of the knowledge base. One of the purposes of the present...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a remotely-supervised Dual-Attention relation classification method and system. The method comprises the following steps: aligning entity pair in a knowledge base to news linguistic data through remote supervision, and constructing an entity pair sentence set; performing word-level vector encoding on the sentence through a Bi-LSTM model based on a word-level attention mechanism so as to obtain a semantic feature encoding vector of the sentence; performing encoding and denoising on the semantic feature of the sentence through the Bi-LSTM model based on the sentence-level attention mechanism so as to obtain a sentence set feature encoding vector; and packing the sentence set feature encoding vector and the entity pair translation vector, and performing the relation classification of the entity pair on the obtained packet feature. Through the technical scheme provided by the invention, the noise data of the model training is reduced, the artificial data annotationand the caused error transmission thereof are avoided. The entity alignment is performed by applying the open domain text and the large-scale knowledge library, and the annotation data scale problemof the relation extraction is effectively solved.

Description

technical field [0001] The invention belongs to the field of relationship classification, and in particular relates to a dual-Attention relationship classification method and system for remote supervision. Background technique [0002] With the development of Internet technology, a large amount of text information on the World Wide Web has grown rapidly, and the technology of automatically extracting knowledge from text information has attracted more and more attention and has become a hot spot. The current mainstream relation extraction method is the relation classification method based on neural network learning, which mainly faces three major problems: the difficulty of semantic feature representation and mining, the error transmission caused by manual annotation, and the noise effect of model training. At present, among the relationship classification methods based on neural network learning, the relationship classification methods that achieve the best results appear in...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/35G06F16/36G06F40/295
CPCG06F40/295
Inventor 贺敏毛乾任王丽宏李晨
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products