Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Maximum Entropy-Based Cross Disambiguation Method for Vietnamese Language

A technology of maximum entropy and maximum entropy model, applied in natural language data processing, network data indexing, instruments, etc.

Active Publication Date: 2020-07-10
KUNMING UNIV OF SCI & TECH
View PDF2 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The present invention provides a maximum entropy-based Vietnamese cross-disambiguation disambiguation method, which is used to solve cross-field disambiguation and improve the accuracy of word segmentation, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Maximum Entropy-Based Cross Disambiguation Method for Vietnamese Language
  • A Maximum Entropy-Based Cross Disambiguation Method for Vietnamese Language
  • A Maximum Entropy-Based Cross Disambiguation Method for Vietnamese Language

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] Embodiment 1: as Figure 1-3 Shown, a kind of Vietnamese cross ambiguity disambiguation method based on maximum entropy, the specific steps of the Vietnamese cross ambiguity disambiguation method based on maximum entropy are as follows:

[0047] Step1. First, carry out disambiguation modeling on the Vietnamese cross-ambiguity field corpus in the formed Vietnamese cross-ambiguity field database, and obtain the Vietnamese maximum entropy cross-ambiguity disambiguation model;

[0048] Step2. Randomly select the test corpus from the Vietnamese cross ambiguity field corpus to disambiguate through the established Vietnamese maximum entropy cross ambiguity disambiguation model to obtain the disambiguation parameter sequence.

Embodiment 2

[0049] Embodiment 2: as Figure 1-3 Shown, a kind of Vietnamese cross ambiguity disambiguation method based on maximum entropy, the specific steps of the Vietnamese cross ambiguity disambiguation method based on maximum entropy are as follows:

[0050]Step1. First, carry out disambiguation modeling on the Vietnamese cross-ambiguity field corpus in the formed Vietnamese cross-ambiguity field database, and obtain the Vietnamese maximum entropy cross-ambiguity disambiguation model;

[0051] Step2. Randomly select the test corpus from the Vietnamese cross ambiguity field corpus to disambiguate through the established Vietnamese maximum entropy cross ambiguity disambiguation model to obtain the disambiguation parameter sequence.

[0052] The specific steps of disambiguation modeling in the step Step1 are as follows:

[0053] Step1.1, first use the crawler program to crawl out the webpage information from the Internet;

[0054] Step1.2. Filter and process the crawled webpage infor...

Embodiment 3

[0060] Embodiment 3: as Figure 1-3 Shown, a kind of Vietnamese cross ambiguity disambiguation method based on maximum entropy, the specific steps of the Vietnamese cross ambiguity disambiguation method based on maximum entropy are as follows:

[0061] Step1. First, carry out disambiguation modeling on the Vietnamese cross-ambiguity field corpus in the formed Vietnamese cross-ambiguity field database, and obtain the Vietnamese maximum entropy cross-ambiguity disambiguation model;

[0062] Step2. Randomly select the test corpus from the Vietnamese cross ambiguity field corpus to disambiguate through the established Vietnamese maximum entropy cross ambiguity disambiguation model to obtain the disambiguation parameter sequence.

[0063] The specific steps of disambiguation modeling in the step Step1 are as follows:

[0064] Step1.1, first use the crawler program to crawl out the webpage information from the Internet;

[0065] Step1.2. Filter and process the crawled webpage info...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a maximum entropy based Vietnamese cross ambiguity elimination method and belongs to the technical field of natural language processing. The method comprises the steps of firstly performing disambiguation modeling on Vietnamese cross ambiguity field corpora in a formed Vietnamese cross ambiguity field library to obtain a Vietnamese maximum entropy cross ambiguity elimination model; and randomly selecting test corpora from the Vietnamese cross ambiguity field corpora and performing disambiguation through the established Vietnamese maximum entropy cross ambiguity elimination model to obtain a disambiguated parameter sequence. The method effectively eliminates the ambiguities of Vietnamese cross ambiguity words and provides powerful support for work such as lexical analysis, syntactic analysis, semantic analysis, information extraction, information retrieval, machine translation and the like; at present, no related Vietnamese cross ambiguity elimination reports are discovered; and the method achieves a very good effect.

Description

technical field [0001] The invention relates to a maximum entropy-based Vietnamese cross-ambiguity disambiguation method, which belongs to the technical field of natural language processing. Background technique [0002] Vietnamese ambiguity disambiguation is the main link in the work of word segmentation and part-of-speech tagging, and it is the basis of other high-level applications and plays an extremely important role. In various Vietnamese information processing software or systems, Vietnamese cross-ambiguity disambiguation is an indispensable work. With the continuous improvement of Internet search technology, disambiguation has attracted more and more attention. The degree of disambiguation of ambiguous fields determines the accuracy of search; at the same time, disambiguation can improve the lexical analysis, syntactic analysis, semantic analysis and Application effects such as machine translation. Ambiguity is divided into intersection ambiguity and combination am...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/951G06F40/289
CPCG06F16/951G06F40/289
Inventor 余正涛刘艳超郭剑毅毛存礼线岩团陈玮
Owner KUNMING UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products