Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Multi-type entity recognition multi-task deep learning model training method and device

A deep learning and entity recognition technology, applied in the field of multi-task deep learning model training, can solve problems such as high error rate, inability to deeply extract data, and long data extraction process, and achieve overall correctness improvement, accurate identification and extraction. Entity, the effect of good generalization ability

Active Publication Date: 2018-11-30
吉奥时空信息技术股份有限公司
View PDF3 Cites 36 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of the above problems, the purpose of the present invention is to provide a multi-task deep learning model training and method device for multi-type entity recognition, aiming to solve the technical problems that the existing data cannot be extracted in depth, and the data extraction process takes a long time and has a high error rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-type entity recognition multi-task deep learning model training method and device
  • Multi-type entity recognition multi-task deep learning model training method and device
  • Multi-type entity recognition multi-task deep learning model training method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] Such as figure 1As shown, the training method of the multi-task deep learning model of multi-type entity recognition provided by the embodiment of the present invention includes the following steps:

[0044] Step S1, data preprocessing: perform data cleaning operations on all acquired text corpus data according to requirements;

[0045] The preprocessing operation in step S1 is mainly to remove invalid characters, spaces, line breaks, etc. in the text corpus data, or remove the web page format of the text corpus data from the web page, etc., thereby purifying the text data.

[0046] Step S2, establishing a skip-gram neural network model to convert the preprocessed text corpus data into vectors;

[0047] The specific process of step S2 is as follows:

[0048] Use the skip-gram neural network model for training to obtain a fixed character feature vector file named vocb, in which each character is converted into a vector of the same length according to the semantics, the...

Embodiment 2

[0072] Such as image 3 As shown, the present invention provides a multi-task deep learning model training device for multi-type entity recognition, which is used to complete the multi-task deep learning model training method for multi-type entity recognition provided by the present invention. The training setup for the multi-task deep learning model includes:

[0073] Data processing unit for data cleaning: all text corpus data will be obtained and data cleaning will be performed according to the requirements;

[0074] The conversion unit is used to establish the skip-gram neural network model and convert the preprocessed text corpus data into vectors;

[0075] The sample data construction unit is used to establish a data set: the entity type to be identified and extracted is used to construct a sample data set;

[0076] Word segmentation unit: used to construct the word segmentation feature of the sample: the preprocessed text corpus data is segmented according to a single...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention is applied to the technical field of data extraction, and provides a multi-type entity recognition multi-task deep learning model training method and device. The method comprises data pre-processing: a skip-gram neural network model is established to convert pre-processed text corpus data into vectors; a sample data set is constructed according to to-be-recognized-and-extracted entity types; word segment features of sample data are constructed; multi-type entity recognition multi-task deep learning model is established. According to the method, the extraction of common features of entities of related types is achieved in a parameter sharing manner, marking of the entities is completed by using an independent model, so that the model has better generalization capability on therecognition and extraction of multiple types of entities of text data, namely the overall correctness of entity recognition is improved, in addition, only one model is trained, the common features only need to be trained for once in one iterative procedure, and training time can be greatly reduced.

Description

technical field [0001] The invention belongs to the technical field of data extraction, and in particular relates to a training method and device for a multi-task deep learning model for multi-type entity recognition. Background technique [0002] Entity recognition and extraction refers to the identification and extraction of words with specific meanings from texts, mainly including names of people, places, institutions, proper nouns, etc.; there are two situations for different types of entities, one situation refers to the above-mentioned names of people and places etc. because of their different characteristics, they belong to different types of entities. Another situation is that they are the same name or place name, but they are regarded as different types of entities according to their different meanings. Generally, different types of entities in a piece of text data are Ubiquitous, but conventional entity recognition tools or methods can only identify names of people...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27G06N3/08
CPCG06N3/08G06F40/279
Inventor 吴杰杨曦沈满刘奕夫周游宇布恒
Owner 吉奥时空信息技术股份有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products