Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Bayesian word sense disambiguation method based on mass pseudo-data

A word sense disambiguation and pseudo-data technology, applied in the field of natural language processing, can solve the problems of time-consuming and labor-intensive disambiguation knowledge, poor disambiguation effect, etc., and achieve the effect of alleviating the problem of data sparseness, improving accuracy and broad development prospects.

Inactive Publication Date: 2017-11-17
SHANXI UNIV
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention mainly aims at the problems of poor disambiguation effect and time-consuming and laborious acquisition of disambiguation knowledge in current word sense disambiguation methods, and provides a Bayesian word sense disambiguation method based on a large amount of dummy data

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Bayesian word sense disambiguation method based on mass pseudo-data
  • Bayesian word sense disambiguation method based on mass pseudo-data
  • Bayesian word sense disambiguation method based on mass pseudo-data

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0027] The specific implementation scheme of the present invention will be given below in combination with examples. "Project One unit One-time ignition success" is the training corpus, and the sentence "wind power unit System Analysis Key Technology Research" is the test corpus, and the ambiguity word "unit" in the test corpus is disambiguated. The meaning of the unit is "machine" and "personnel".

[0028] A kind of Bayesian word sense disambiguation method based on a large amount of dummy data of the present invention, comprises the following steps:

[0029] Step 1. Use a dependency parser to analyze the training examples, and collect tuples that have a dependency relationship with the target ambiguous word. The specific operations are as follows:

[0030] Syntactically analyze the instance, such as figure 2 shown. Get the dependency tuples (number, unit) and (unit, ignition). Take the second tuple (unit, ignition) as an example to illustrate the working principle of...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention particularly relates to a new bayesian word sense disambiguation method based on mass pseudo-data. The problems that a current word sense disambiguation method is poor in disambiguation effect and capable of wasting time and labor when disambiguation knowledge is obtained are solved. The new bayesian word sense disambiguation method includes the steps that through a dependency grammar analyzer, training examples containing ambiguous words in a training corpus base are subjected to syntactic analysis, and tuples with the dependence relationship with the ambiguous words are collected; then through a machine translation system, example sentences containing the tuples in a machine translation corpus base are searched. The steps are repeatedly carried out in a mode, the searched example sentences are added into a pseudo-training corpus base, and then through the training corpus base and the pseudo-training corpus base, a bayesian disambiguation model is trained; word meanings of the ambiguous words are decided through the disambiguation model, and on the basis of a small amount of manually-annotated corpuses, the data sparsity problem of word sense disambiguation can be effectively solved, the accuracy of word sense disambiguation is increased, and the new bayesian word sense disambiguation method has broad development prospects.

Description

technical field [0001] The invention belongs to the technical field of natural language processing, and in particular relates to a new Bayesian word sense disambiguation method based on a large amount of dummy data. technical background [0002] Word sense disambiguation (Word Sense Disambiguation, WSD) refers to determining the meaning of polysemous words in a specific context of natural language, which is a core issue in the field of natural language processing. In the process of machine understanding of natural language, when an ambiguous word appears in a specific context, word ambiguity will appear, especially in the current Internet age of "information explosion", the problem of lexical ambiguity is even more serious. Whether it is Chinese or Western languages, the phenomenon of polysemy is common. [0003] Currently, corpus-based word sense disambiguation methods can be divided into supervised and unsupervised methods. Unsupervised methods do not require training co...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/211G06F40/216G06F40/247G06F40/284
Inventor 杨陟卓张虎李茹谭红叶陈千
Owner SHANXI UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products