Uygur language part-of-speech tagging method

A part-of-speech tagging, Uyghur language technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve problems such as Uyghur language blanks

Active Publication Date: 2014-07-02
国网新疆电力有限公司信息通信公司 +1
View PDF5 Cites 14 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

A large number of scholars at home and abroad have conducted a lot of in-depth and detailed research on English and Chinese, but the research on Uyghur is basically blank

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Uygur language part-of-speech tagging method
  • Uygur language part-of-speech tagging method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0014] A Uyghur part-of-speech tagging method, 1. Formulate a Uyghur part-of-speech tagging set and a Uyghur corpus with millions of words; 2. In the first-level tagging, select a Uyghur part-of-speech tagging model based on a conditional random field method. This method extracts features Flexible and high accuracy; 3. Construct correct tagging rule base and unambiguous part-of-speech tagging dictionary and proper noun dictionary, construct a first-level part-of-speech tagging correction algorithm based on rules and dictionaries, and further improve the accuracy of first-level part-of-speech tagging; 4. Provide a part-of-speech tagging method based on word stem extraction to further increase the coverage of tagged words; 5. Provide a secondary part-of-speech tagging statistical model to increase the coverage and success rate of tagged words; 6. In the second-level tagging, it is tagged through the unambiguous word dictionary and the proper noun dictionary, and then the stem ext...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a Uygur language part-of-speech tagging method. The method includes 1, formulating a Uygur language part-of-speech tagging set and a million-word Uygur language corpus; 2, selecting a method based on conditional random fields in primary tagging to build a Uygur language part-of-speech tagging model, wherein the method is flexible in feature extraction and high in accuracy; 3, building a correct tagging rule library, an unambiguous part-of-speech tagging dictionary and a proper noun dictionary, and building a primary part-of-speech tagging correction algorithm based on rules and dictionaries to further improve accuracy of primary part-of-speech tagging; 4, providing a part-of-speech tagging method based on stem extraction to further increase coverage rate of tagged words; 5, providing a secondary part-of-speech tagging statistical model to increase coverage rate and success rate of the tagged words; 6, tagging in secondary tagging through the unambiguous dictionary and the proper noun dictionary, and realizing secondary part-of-speech tagging with extremely high accuracy through stem extraction tagging and statistical model tagging. By the Uygur language part-of-speech tagging method, the problem of part-of-speech tagging of Uygur language is solved efficiently.

Description

technical field [0001] The invention relates to a language information processing technology, in particular to a part-of-speech tagging method of Uighur language. Background technique [0002] Today, with the informatization of the national economy and society, massive amounts of information are generated, stored, and disseminated every day. Human beings are faced with unprecedented information expansion. How to find the information they need from the massive amount of information The information that can be understood by oneself has become a common concern of people, and it is also a problem that information processing needs to solve. At present, natural language processing has become a striking research hotspot in the field of information processing. [0003] The Xinjiang Uygur Autonomous Region is a region where many ethnic groups live in concentrated communities. Among the current population of more than 20 million, there are more than 13 million ethnic minorities, acco...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 尼加提·纳吉米买合木提·买买提帕肉克·司地克马斌
Owner 国网新疆电力有限公司信息通信公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products