Entity identification method and device, electronic equipment and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for entity recognition and entity grouping, applied in neural learning methods, electrical digital data processing, instruments, etc., can solve problems such as poor portability, cumbersome feature selection, and a large number of problems, and achieve the effect of improving representation ability and high labeling efficiency

Pending Publication Date: 2020-11-24

杭州远传新业科技股份有限公司

View PDF0 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The unsupervised model has disadvantages such as cumbersome feature selection and poor portability, while the most intuitive disadvantage of the supervised model is that it requires a large amount of labeled corpus, which often requires a lot of manpower to label data, and the quality of data labeling will greatly affect The recognition accuracy of the model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0048] Embodiment 1 provides an entity recognition method, please refer to figure 1 shown, including the following steps:

[0049] S110, learn the text to be marked based on the BERT model to obtain word vectors of each word in the text to be marked, and form a text vector from the word vectors of each word.

[0050] The BERT (Bidirectional Encoder Representations from Transformers) model is a deep bidirectional pre-trained language understanding model using the Transformer model as a feature extractor. Essentially, it learns a good feature representation for words by running a self-supervised learning method on the basis of massive corpus. , self-supervised learning refers to supervised learning that operates on data that has not been manually labeled. The Transformer model is a classic NLP model proposed by the Google team. For example, the following formula models a piece of text based on the attention mechanism, which can be trained in parallel and has global information....

Embodiment 2

[0075] Embodiment 2 is an improvement on the basis of Embodiment 1, please refer to figure 2 As shown, the text to be marked is learned based on the BERT model to obtain the word vector of each word in the text to be marked, and the word vector of each word is used to form a text vector, including the following steps:

[0076] S210. Place a sentence start tag, a sentence end tag, and a segmentation tag respectively at the beginning of a sentence, at the end of a sentence of the text to be tagged, and between two sentences in the text to be tagged to obtain an intermediate text. Usually, the sentence beginning label, sentence end label and segmentation label use the [CLS] label, [SEP] label and [SEP] label respectively, and it is convenient to obtain the context information of each word in the text to be labeled when learning based on the BERT model .

[0077] S220. Segment the intermediate text at the character level to obtain a plurality of words, randomly select several wo...

Embodiment 3

[0093] Embodiment 3 discloses an entity recognition device corresponding to the above embodiment, which is the virtual device structure of the above embodiment, please refer to image 3 shown, including:

[0094] The text vector calculation module 410 is used to learn the text to be marked based on the BERT model to obtain the word vector of each word in the text to be marked, and form the text vector by the word vector of each of the words;

[0095] A model set and an unlabeled corpus acquisition module 420, configured to acquire a model set including N preliminary trained neural network models and an unlabeled corpus including a plurality of unlabeled texts, and convert the N preliminary trained neural networks The models are respectively recorded as mi, i=1,..., N, N>2;

[0096] The collaborative training module 430 is configured to, for each of the initially trained neural network models mi, identify each of the unlabeled texts based on other N-1 initially trained neural ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses an entity recognition method and device, electronic equipment and a storage medium, relates to the field of natural language processing, and solves the problem that entity recognition needs to label corpus samples on a large scale, and the method comprises the following steps: learning a text to be labeled based on a BERT model to obtain a text vector; carrying out preliminary training on each neural network model by utilizing the labeled text; respectively obtaining N-1 groups of entity label sequences of each unlabeled text based on other N-1 preliminarily trained neural network models in the model set, training each preliminarily trained neural network model mi based on each unlabeled text and the N-1 groups of entity label sequences of each unlabeled text to obtain a cooperatively trained neural network model Mi; calculating the text vector based on a plurality of cooperatively trained neural network models and CRF models to obtain a plurality of candidate labeling sequences; and selecting a group of candidate annotation sequences as annotation results of the to-be-annotated text based on voting rules.

Description

technical field [0001] The present invention relates to the field of natural language processing, in particular to an entity recognition method, device, electronic equipment and storage medium. Background technique [0002] Named Entity Recognition (NER) is one of the most widely used and practical key technologies in the field of natural language processing. It is the basis of knowledge graphs, machine translation, question answering systems and other fields. In the text, there are entities with specific meaning or strong references and they are classified. The types of these entities mainly include person names, organization names, places and some other proper nouns. [0003] The training methods of entity recognition models are generally divided into two types: supervised and unsupervised. Among them, CRF and HMM are commonly used models for unsupervised models, and neural network models are the main representatives for supervised models. The unsupervised model has disad...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/295G06F40/216G06N3/04G06N3/08

CPCG06F40/295G06F40/216G06N3/08G06N3/088G06N3/045

Inventor 嵇望朱鹏飞王伟凯钱艳安毫亿梁青陈默

Owner 杭州远传新业科技股份有限公司

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Entity identification method and device, electronic equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology