Pre-training model-based domain map entity and relationship joint extraction method and system

A pre-training and model technology, applied in the field of big data, can solve problems such as the complicated process of map construction, ignoring the important characteristics of the joint, and the inability to achieve joint training, etc., to achieve the effect of reducing labor costs, reducing complexity, and being easy to understand

Active Publication Date: 2021-12-10
EAST CHINA NORMAL UNIV
View PDF19 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, it ignores the important features of the joint between each step, which also makes the map construction process cumbersome, and it is impossible to achieve joint training and complete the extraction work under one problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Pre-training model-based domain map entity and relationship joint extraction method and system
  • Pre-training model-based domain map entity and relationship joint extraction method and system
  • Pre-training model-based domain map entity and relationship joint extraction method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0066] refer to figure 1 , has illustrated the flow process of the method operation of the present invention.

[0067] The knowledge map entity and relation extraction method based on the pre-trained model described in this embodiment includes the following steps:

[0068] (1) Obtain the original data, divide the data into a training set and a test set after labeling, and establish a preliminary small-scale insurance labeling data set U and a candidate relationship set V, specifically including the following steps:

[0069] (1.1) Grab the text information in the insurance field on the relevant websites of the insurance company, use the crawler to grab the product introduction and comparative analysis of the specific insurance website, and finally save it in a unified text form;

[0070] (1.2) Data cleaning, to filter out key paragraphs from the acquired text, remove useless information such as head and tail, pictures, etc.; small-scale labeling, select some representative sen...

Embodiment 2

[0081] refer to figure 2 , is the model architecture used for map relationship and entity pair extraction, which can be divided into three specific modules:

[0082] (1) Pre-training model encoding module:

[0083] The pre-training model encoding module can effectively capture the contextual semantic information, and convert the sentence S=[w 1 ,...w n ], n represents the length of the sentence, as the input of the pre-training model to obtain the feature vector representation of the sentence sequence, in order to obtain the sentence w i The context of each token means x i , different Transformer-based networks can be used. In the present invention, the pre-training model BERT (not limited to BERT) is used as the basic encoder. The BERT output is as follows:

[0084] {x 1 ,...,x n} = BERTw 1 ,...,w n})

[0085] Here is consistent with the common one, the feature encoding x of each word in the sentence i The corresponding tag, segment and position information are sum...

Embodiment 3

[0100] refer to image 3 , is the proportion of each relationship in the finally extracted triplet data.

[0101] The original text is based on related products in the insurance field, and has strong pertinence. In the description of an insurance product, there are limited types of common relationships, and the model can achieve better results in actual extraction. finally showed image 3 proportion of the situation.

[0102] Among them, the most common relationship types are generally the first dozen or so, and the frequency of subsequent relationships is greatly reduced. All the remaining relationships with low frequency of occurrence are classified as "other" and the proportion is almost the same as that of the highest relationship. It can be seen that when constructing a map in a specific field, it is very likely that there will be a relatively concentrated relationship type, which will help researchers use the data for subsequent research and analysis.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a pre-training model-based domain map entity and relationship joint extraction method. The method comprises the following steps: A, capturing insurance domain text information on related websites of an insurance company, cleaning and marking data, and establishing an initial data set U and a candidate relationship set V; B, based on the pre-training model, constructing a joint learning framework of relation discrimination and entity pair extraction, and training and testing the model; C, screening newly extracted data generated in the testing process, and then amplifying a training set; D, repeatedly iterating by using the updated data set until the model is stable; and E, carrying out triple data export processing, and constructing a domain knowledge graph. The invention also provides a system for realizing the method. According to the related method, the target relation interacts with each word of the text, all possible entity pairs are accurately generated, the problem of entity overlapping is naturally avoided, and meanwhile multiple relations and multiple entity pairs can be extracted.

Description

technical field [0001] The invention belongs to the field of big data technology, and relates to a method and system for joint extraction of domain map entities and relationships based on a pre-training model, which is used for deep learning research and analysis related to the acquisition of domain map triplet data. Background technique [0002] With the development of the mobile Internet, the Internet of Everything has become possible, and the data generated by this interconnection is also growing explosively, and these data can just be used as effective raw materials for analyzing relationships. In the era of mobile Internet, the relationship between individuals will inevitably become a very important part of our in-depth analysis. As long as there is a need for relational analysis, knowledge graphs are "possibly" useful. From the initial Google search to the current chatbots, big data risk control, securities investment, smart medical care, adaptive education, and recom...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/36G06F16/28G06K9/62G06N3/04G06N3/08
CPCG06F16/367G06F16/288G06N3/08G06N3/044G06N3/045G06F18/2415G06F18/214
Inventor 朱静丹姚俊杰
Owner EAST CHINA NORMAL UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products