A mixed corpus named entity recognition method based on bi-lstm-cnn

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A named entity recognition and corpus technology, applied in the information field, can solve the problems of gradient disappearance, low recognition rate of unregistered words, and insignificant advantages of final named entity recognition results, and achieve the effect of improving accuracy and avoiding unregistered words.

Active Publication Date: 2021-02-09

北京知道未来信息技术有限公司

View PDF5 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0012] Disadvantage 1: The detection granularity of multiple languages is not easy to distinguish, and there is a loss of participle accuracy because a certain language is not detected

[0013] Disadvantage 2: HMM (Hidden Markov) and CRF (Conditional Random Field) methods based on word frequency statistics can only relate to the semantics of the previous word of the current word, and the recognition accuracy is not high enough, especially the recognition rate of unregistered words is low;

[0014] Disadvantage 3: The method based on the artificial neural network model has the problem of gradient disappearance during training, and in actual applications, the number of network layers is small, and the final named entity recognition results have no obvious advantages

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0045] In order to make the above objects, features and advantages of the present invention more comprehensible, the present invention will be further described in detail below through specific embodiments and accompanying drawings.

[0046]The invention discloses a Bi-LSTM-CNN-based named entity recognition method of mixed corpus. For example, identifying named entities such as person names, place names, and organization names in corpus data that is mixed in multiple languages. The core problems of the present invention include three: 1. the efficiency of mixed corpus recognition, 2. the precision of named entity recognition, and 3. the recognition precision of unregistered words.

[0047] In order to solve the problem of unregistered words, the present invention abandons the traditional vocabulary method, but adopts the idea based on word vectors, and is based on character vectors rather than word-based vectors.

[0048] In order to solve the problem of low precision of the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a Bi-LSTM-CNN-based mixed corpus named entity recognition method. This method converts the labeled training mixed corpus data into character-level mixed corpus data in the training phase, then trains the deep learning model based on Bi-LSTM-CNN, and converts the unlabeled test mixed corpus data into characters in the prediction phase Level mixed corpus data, and then use the deep learning model trained in the training phase to make predictions. The present invention uses character-level rather than word-level vectors, which can avoid the impact of participle accuracy, and can also avoid the problem of unregistered words; in addition, it adopts a combination model of bidirectional long-term short-term memory neural network Bi-LSTM and convolutional neural network CNN Compared with the traditional algorithm, the accuracy is much improved; directly using the mixed corpus for model training does not need to detect and separate each language of the mixed corpus, and finally achieves the purpose of identifying the mixed corpus.

Description

technical field [0001] The invention belongs to the field of information technology, and in particular relates to a Bi-LSTM-CNN-based mixed corpus named entity recognition method. Background technique [0002] Named entity recognition refers to the process of identifying specified entity nouns with specific meanings for a given dataset. The practical scenarios of the named entity recognition method include: [0003] Scenario 1: Event detection. Place, time, and person are several basic components of time. When constructing an event summary, relevant persons, places, units, etc. can be highlighted. In the event search system, related people, time, and places can be used as index keywords. The relationship between several components of an event describes the event in more detail at the semantic level. [0004] Scenario 2: Information retrieval. Named entities can be used to enhance and improve the effect of the retrieval system. When the user enters "major", it can be fou...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F40/279G06F40/205G06N3/04G06N3/08

CPCG06N3/049G06N3/08G06F40/279G06F40/205G06N3/045

Inventor 唐华阳岳永鹏刘林峰

Owner 北京知道未来信息技术有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A mixed corpus named entity recognition method based on bi-lstm-cnn

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology