Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

A method and device for word sense disambiguation based on word vectors

A word meaning disambiguation and word vector technology, which is applied in the fields of instruments, computing, and electronic digital data processing, etc., can solve problems such as difficulty in expressing semantic information, semantic relationship between words, and sparse data

Active Publication Date: 2021-11-23
KUNMING UNIV OF SCI & TECH
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Word vectors have long been used in word sense disambiguation tasks. The previous vector representation method: One-HotRepresentation, the length of a word vector represented by this method is the length of the vocabulary, and most positions of the word vector are Zero, only the dimension corresponding to the position of the word in the vocabulary is 1. Obviously, this method is difficult to express the semantic information contained in the word and the semantic relationship between words
In addition, this representation has the problem of data sparsity

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method and device for word sense disambiguation based on word vectors

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0044] Embodiment 1: This embodiment uses the data in the senseval-3 data set, and the data set includes a training set, a test set and a collection of sense items of all ambiguous words; wherein the training set contains 7860 documents, and the test set contains 3944 documents, each All documents have corresponding ambiguous words, document codes and correct meanings of the ambiguous words in this document; the set of sense items of all ambiguous words contains the codes and meanings of 57 ambiguous words. Now take a document of the ambiguous word "activate" as an example to disambiguate.

[0045]Documents containing the ambiguous word "activate": Do you know what it is , and where I can get one .We suspect you had seen the TerrexAutospade , which is made by WolfTools .It is quite a hefty spade , with bicycle - type handlebars and asprung lever at the rear , which you step on to activate it .Used correctly , you should n't have to bend your back during general digging, althou...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to a method and device for word sense disambiguation based on word vectors. The method of the invention includes a data preprocessing step of performing punctuation and word segmentation on documents and meaning items; a step of training word vectors, using a word vector training tool to train words Vector; the context vector representation step is to obtain the word vector and use the local weighting method to calculate the context vector; the semantic item vector representation step is to obtain the word vector of each word of the semantic item, and calculate the semantic item vector; the similarity calculation step is to calculate the relationship between the context vector and each semantic item vector The cosine similarity between the meaning items; the distribution frequency calculation step of meaning items, the distribution frequency of each meaning item of ambiguous words in the statistical data set; the final score statistics step, the calculation of the cosine similarity between the context and each meaning item and the comprehensive score of each meaning item frequency, the meaning item with the highest score for the best vocabulary.

Description

technical field [0001] The present invention relates to a word meaning disambiguation method and device based on word vectors, belonging to the fields of natural language processing (Natural Language Processing), machine translation (Machine Translation), artificial intelligence (Artificial Intelligence) and other fields. Background technique [0002] In recent years, with the development of science and technology, word meaning disambiguation has become increasingly important in natural language processing, machine translation, artificial intelligence and other fields. Word sense disambiguation has become an urgent problem to be solved. [0003] With the popularity of the concept of word sense disambiguation, scholars have proposed solutions to word sense disambiguation. The word sense disambiguation knowledge used in the early days was artificially woven rules, but manually writing the rules was time-consuming and laborious and there was a bottleneck problem of knowledge a...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/47G06F40/44G06K9/62
CPCG06F40/289G06F40/58
Inventor 吕晓伟贾连印
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products