A mixed corpus named entity recognition method based on bi-lstm-cnn

A named entity recognition and corpus technology, applied in the information field, can solve the problems of gradient disappearance, low recognition rate of unregistered words, and insignificant advantages of final named entity recognition results, and achieve the effect of improving accuracy and avoiding unregistered words.

Active Publication Date: 2021-02-09
北京知道未来信息技术有限公司
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0012] Disadvantage 1: The detection granularity of multiple languages ​​is not easy to distinguish, and there is a loss of participle accuracy because a certain language is not detected
[0013] Disadvantage 2: HMM (Hidden Markov) and CRF (Conditional Random Field) methods based on word frequency statistics can only relate to the semantics of the previous word of the current word, and the recognition accuracy is not high enough, especially the recognition rate of unregistered words is low;
[0014] Disadvantage 3: The method based on the artificial neural network model has the problem of gradient disappearance during training, and in actual applications, the number of network layers is small, and the final named entity recognition results have no obvious advantages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A mixed corpus named entity recognition method based on bi-lstm-cnn
  • A mixed corpus named entity recognition method based on bi-lstm-cnn
  • A mixed corpus named entity recognition method based on bi-lstm-cnn

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0046]The invention discloses a Bi-LSTM-CNN-based named entity recognition method of mixed corpus. For example, identifying named entities such as person names, place names, and organization names in corpus data that is mixed in multiple languages. The core problems of the present invention include three: 1. the efficiency of mixed corpus recognition, 2. the precision of named entity recognition, and 3. the recognition precision of unregistered words.

[0047] In order to solve the problem of unregistered words, the present invention abandons the traditional vocabulary method, but adopts the idea based on word vectors, and is based on character vectors rather than word-based vectors.

[0048] In order to solve the problem of low precision of the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Bi-LSTM-CNN-based mixed corpus named entity recognition method. This method converts the labeled training mixed corpus data into character-level mixed corpus data in the training phase, then trains the deep learning model based on Bi-LSTM-CNN, and converts the unlabeled test mixed corpus data into characters in the prediction phase Level mixed corpus data, and then use the deep learning model trained in the training phase to make predictions. The present invention uses character-level rather than word-level vectors, which can avoid the impact of participle accuracy, and can also avoid the problem of unregistered words; in addition, it adopts a combination model of bidirectional long-term short-term memory neural network Bi-LSTM and convolutional neural network CNN Compared with the traditional algorithm, the accuracy is much improved; directly using the mixed corpus for model training does not need to detect and separate each language of the mixed corpus, and finally achieves the purpose of identifying the mixed corpus.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a Bi-LSTM-CNN-based mixed corpus named entity recognition method. Background technique [0002] Named entity recognition refers to the process of identifying specified entity nouns with specific meanings for a given dataset. The practical scenarios of the named entity recognition method include: [0003] Scenario 1: Event detection. Place, time, and person are several basic components of time. When constructing an event summary, relevant persons, places, units, etc. can be highlighted. In the event search system, related people, time, and places can be used as index keywords. The relationship between several components of an event describes the event in more detail at the semantic level. [0004] Scenario 2: Information retrieval. Named entities can be used to enhance and improve the effect of the retrieval system. When the user enters "major", it can be fou...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/279G06F40/205G06N3/04G06N3/08
CPCG06N3/049G06N3/08G06F40/279G06F40/205G06N3/045
Inventor 唐华阳岳永鹏刘林峰
Owner 北京知道未来信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products