Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Chinese ambiguity word processing method based on dependency parsing

A technology of dependency analysis and conjunctive words, which is applied in electrical digital data processing, special data processing applications, instruments, etc., and can solve the problem of low accuracy of part-of-speech recognition of conjunctive words

Inactive Publication Date: 2015-10-28
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF2 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Common part-of-speech tagging tools such as Stanford NLP, LTP, and Zpar do not specifically consider the phenomenon of Chinese part-of-speech words, so the accuracy of part-of-speech recognition for part-of-speech words is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese ambiguity word processing method based on dependency parsing

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0012] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Apparently, the described embodiments are only some of the embodiments of the present invention, not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0013] figure 1 The flow chart of the method for processing concurrent words based on dependency analysis provided by an embodiment of the present invention specifically includes the following steps: first construct a large number of sentences containing concurrent words as the training corpus and obtain the words with higher accuracy and co...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

Part-of-speech tagging of a Chinese ambiguity word is one of the major problems influencing performance of a Chinese information processing system currently. The ambiguity word is a grammatical phenomenon of one word with multiple parts of speech existing in Chinese, such as a gerund isomorph. The invention discloses a Chinese ambiguity word processing method based on dependency parsing. The method comprises the following three steps: a sentence containing a large number of ambiguity words is firstly selected to serve as a training language material, and an ambiguity word semantic role statistical rule with high accuracy and coverage is obtained through dependency parsing; then, sentence dividing, word dividing, part-of-speech tagging and dependency parsing pre-treating are carried out on a to-be-processed text, and on the base of a morphology, syntax and context rule bank, parts of speech of ambiguity words in a word dividing result are tagged; and finally, the ambiguity word semantic role statistical rule obtained in advance is adopted, the part of speech of an ambiguity word in a different context environment can be accurately recognized through dependency parsing. The method can effectively improve part-of-speech tagging accuracy in the case of Chinese text processing, and can be widely applied to various Chinese information processing systems.

Description

technical field [0001] The invention relates to the field of natural language processing, in particular to part-of-speech consistency checking of Chinese information processing and Chinese concurrent word processing. Background technique [0002] Conjunctive words mean that a word has two or more types of grammatical functions in different contexts, that is, the vocabulary has different parts of speech in different contexts. Specifically, the concurrent words have the same pronunciation and the same symbolic form, and there is a certain connection in the meaning of the words, that is, the same sound, the same shape, and the meanings are not exactly the same but closely related. Concurrent words are a unique grammatical phenomenon in Chinese. In English, words with the same root but different parts of speech usually have different morphological forms. However, in the Chinese context, the phenomenon of concurrent part of speech in a word with multiple parts of speech is more c...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F17/27
Inventor 刘峤刘瑶秦志光其他发明人请求不公开姓名
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products