Entity-identification cooperative learning algorithm based on document type

A document type and entity recognition technology, applied in the computer field, can solve the problems of algorithm diversity without considering data set diversity, data set diversity without considering algorithm diversity, and data set diversity, etc., to improve accuracy rate effect

Active Publication Date: 2015-10-14
BEIJING UNIV OF TECH
View PDF5 Cites 19 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This algorithm adopts the model of integrated learning. Although the diversity of the learner is considered, the same data set is used for each cycle training learner, and the diversity of the data set is not considered. This is a shortcoming of it.
[0006] Although the existing methods introduced above have all improved the accuracy of the labeling problem, they only consider one aspect, either considering the diversity of the data set and not the diversity of the algorithm, or considering the diversity of the algorithm Sex does not consider the diversity of data sets, and cannot meet the needs of both aspects

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Entity-identification cooperative learning algorithm based on document type
  • Entity-identification cooperative learning algorithm based on document type
  • Entity-identification cooperative learning algorithm based on document type

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Features and exemplary embodiments of various aspects of the invention will be described in detail below

[0023] Block diagram of entity recognition cooperative learning system based on document type figure 1 As shown, it includes: a data set preprocessing device (1), a document type-based classifier construction device (2), a text classifier construction device (3), and a model application device (4). Among them: the data set preprocessing device (1) is connected with the document type-based classifier construction device (2); the document type-based classifier construction device (2), the text classifier construction device (3) and the model application device (4 ) connected.

[0024] Data set preprocessing device (1): extract all marked entities from all sparsely marked training corpus to form a dictionary, and use the dictionary to update each training document in the training corpus;

[0025] Document type-based classifier construction device (2): Under the cond...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses an entity-identification cooperative learning algorithm based on a document type, wherein the algorithm comprises a model building module and a model application. The model building module comprises text-type-based entity-identification classifier building and text classifier building. The model application comprises text type identification and text-type-based entity identification. In the algorithm, an integrated learning algorithm and a cooperative training algorithm are combined; meanwhile, in a data set pre-processing process, text-type-based data set segmentation is carried out, and the variety of data sets is considered. In a model building process, sparse labeled data is utilized as training data, and a plurality of basic algorithms are applied and are integrated by virtue of an integrated learning mode, so that the variety of the algorithms is considered. By combining multiple technologies, and considering the variety of algorithms and data sets, the entity-identification cooperative learning algorithm is guaranteed to obtain satisfying effect on an entity identification task.

Description

technical field [0001] The invention belongs to the field of computers, and more specifically relates to a document type-based entity recognition cooperative learning algorithm, which can well improve the accuracy of entity recognition. Background technique [0002] With the development of information industry, the scale of computer network is expanding day by day, and a large amount of information appears in front of people in the form of electronic documents. At the same time, in various companies and enterprises, a large amount of data is also accumulated in this form. Most of the potentially useful information in these data exists in unstructured form. The accumulation of these large amounts of free texts also poses challenges for the application of semantic technology in enterprise-level environments. Therefore, people urgently need some technology to process this information. Entity linking is attracting increasing attention as a technique capable of linking a word ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06K9/62G06N5/02
CPCG06N5/025G06F18/2415
Inventor 孙靖超李建强刘璐赵旭莫豪文田猛
Owner BEIJING UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products