Chinese text data word vector representation method based on BIE position word list

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of text data and word vectors, applied in digital data processing, natural language data processing, instruments, etc., can solve the problems of lexical information loss and difficulty in wide application, achieve high accuracy, and improve the ability to solve entity nesting problems effect of ability

Pending Publication Date: 2022-04-05

CHONGQING UNIV OF POSTS & TELECOMM

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the effect of lexical information integration is strongly related to the Embedding strategy. The WC-LSTM structure has the problem of lexical information loss, and the Multi-digraph structure relies on dictionary labels, making it difficult to be widely used.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0046] refer to figure 1 , figure 1 A flow chart of a method for characterizing Chinese text data word vectors based on a BIE position word list provided by an embodiment of the present invention, specifically including:

[0047] It is difficult for Chinese text word vectors to express the position and boundary information of Chinese words, which brings great challenges to Chinese named entity recognition. Therefore, in this embodiment, the discussion mainly focuses on the Chinese text data set.

[0048] How to express the position information of the word in the corresponding word in the word vector is the key of the present invention. In the Chinese entity recognition task, learning lexical boundaries can help the model distinguish entity boundaries, so the three position dimensions of BIE are used to express the position information of words in words. At the same time, in order to take into account the full quantifier set and the strong related word set, the weight T is u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to a Chinese text data word vector representation method based on a BIE position word list, and relates to the field of deep learning and named entity recognition, and the method comprises the following steps: S1, generating a total word set and a strong correlation word set, and constructing the BIE position word list; s2, constructing a position-independent word vector by using the original representation of the word vector; s3, condensing word vector representations in the word set based on a word frequency weighted average pooling algorithm; and S4, weighting the BIE position word vector of the word and splicing the weighted BIE position word vector with the original word vector to generate a word vector containing vocabulary position information. According to the method, the position information of the highly correlated vocabularies can be highlighted while the total position information of the vocabularies is fused into the word vectors. And the character vector representation dimension is expanded, so that the Chinese entity recognition result has higher accuracy.

Description

technical field [0001] The invention belongs to the field of deep learning and named entity recognition, and relates to a Chinese text data word vector representation method based on a BIE position word list. Background technique [0002] Named Entity Recognition (NER) is a basic work in the field of natural language processing, and is a subtask of tasks such as information retrieval, relation extraction, and question answering systems. Unlike natural word segmentation in English text, the number of characters in a word in Chinese text is not fixed, and there is no word segmentation identifier. This makes it difficult to learn lexical boundary information for Chinese named entity recognition tasks. Therefore, it is necessary to incorporate lexical position information into the Chinese word vector representation to improve the accuracy of Chinese named entity recognition. [0003] At present, the widely used vocabulary enhancement methods are vocabulary enhancement methods ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/295G06F40/216G06F40/284

Inventor 王进王猛旗林兴杜雨露孙开伟

Owner CHONGQING UNIV OF POSTS & TELECOMM

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Chinese text data word vector representation method based on BIE position word list

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology