Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method and system for eliminating ambiguity of Chinese word segmentations

A Chinese word segmentation and elimination system technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the real-time problems of algorithms, a large number of training corpus and ambiguous thesaurus, etc.

Active Publication Date: 2016-12-14
北京如布科技有限公司
View PDF4 Cites 11 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The purpose of the present invention is to provide a method for disambiguation of Chinese word segmentation, to solve the problem that the existing algorithm needs a large amount of training corpus and ambiguous thesaurus, and the real-time problem of the accompanying algorithm

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and system for eliminating ambiguity of Chinese word segmentations
  • Method and system for eliminating ambiguity of Chinese word segmentations

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0081] Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

[0082] figure 1 Shows a flow chart of a method for disambiguating Chinese word segmentation according to an exemplary embodiment, which specifically includes the following steps:

[0083] Step 101: Perform word segmentation on the sentence to be segmented, and obtain the initial word segmentation result

[0084] As we all know, English is based on words, and words are separated by spaces, while Chinese is based on words, and all the words ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The embodiment of the invention provides a method and a system for eliminating ambiguity of Chinese word segmentations. The method comprises the following steps: segmenting a to-be-segmented word, thereby acquiring an initial segmenting result; extracting a segmentation ambiguity point according to the initial segmenting result; constructing a new segmented word containing the segmentation ambiguity point and calculating the maximum entropy model score of the new segmented word; judging if the new segmented word is a valid segmented word according to the maximum entropy model score of the new segmented word; correcting the initial segmenting result with the valid segmented word. According to the embodiment of the invention, the defect of requirement for a large amount of training corpus data and ambiguity corpus of the prior art is overcome and the word segmentation effect can achieve the practical precision.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to a method and system for eliminating ambiguity of Chinese word segmentation. Background technique [0002] Chinese word segmentation is a major difficulty in Chinese analysis and computer processing. Among them, the problem of disambiguation in Chinese word segmentation has always been a difficult and hot issue in Chinese word segmentation. At present, the commonly used Chinese word segmentation ambiguity elimination methods include ngram model method, verb priority method, information entropy method, Chinese ambiguous thesaurus method, etc. These methods have solved the problem of word segmentation ambiguity to varying degrees, but there is no word segmentation method suitable for various scenarios, which can effectively eliminate the ambiguity of Chinese word segmentation in various situations. Taking the existing CAS (compare-and-swap) technology as an example, it on...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
CPCG06F40/289
Inventor 柳艳红郭祥郭瑞
Owner 北京如布科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products