Chinese text word order adjustment and quantifier completion method and system

A text language and Chinese technology, applied in natural language data processing, instrumentation, electronic digital data processing, etc., can solve the problem of limited corpus of annotation, achieve the effects of simplified structure, great versatility, and reduced time and labor costs

Active Publication Date: 2021-02-26
INST OF COMPUTING TECH CHINESE ACAD OF SCI
View PDF4 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In order to solve the above-mentioned problem of limited marked corpus, the present invention provides a method of using a small amount of unsupervised corpus data to complete the word order adjustment of Chinese text and the positioning and completion of quantifiers

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese text word order adjustment and quantifier completion method and system
  • Chinese text word order adjustment and quantifier completion method and system
  • Chinese text word order adjustment and quantifier completion method and system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] In order to solve the above-mentioned problem of limited labeled corpus, the present invention provides a method of using a small amount of unlabeled data to complete the word order adjustment of Chinese text and the positioning and completion of quantifiers. The invention includes two stages of preparation and correction, the preparation stage only needs to be executed once, and the correction stage can correct a sentence of Chinese text every time it is executed.

[0031] 1. Preparation stage

[0032] (1) Design word order adjustment rules

[0033] Word order adjustment rules are general rules for transforming the word order of text in one language to the word order of text in another language. The word order of a text can be formally defined through the grammatical structure. A sentence is composed of words. The grammatical structure is the syntactic structure of the sentence and the dependency relationship between the words in the sentence. For example, the word o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a Chinese text word order adjustment and quantifier completion method and system. The method comprises the steps: inputting a word sequence in a Chinese corpus into an N-elementlanguage model, obtaining an N-element word list of the Chinese corpus, carrying out the quantifier labeling of the corpus in the Chinese corpus, and forming a quantifier list, deleting the labeled quantifier in the Chinese corpus, forming a parallel corpus with the Chinese corpus, and training the bidirectional long-short-term memory model by taking the parallel corpus as training data to obtaina quantifier completion model; performing part-of-speech tagging on a to-be-adjusted Chinese text, adjusting a statement structure and a sequence in the Chinese text according to a word sequence adjustment rule to form a text sequence adjustment candidate set composed of a plurality of new texts, performing cluster search in the text sequence adjustment candidate set by utilizing an N-element word list, and selecting words according to probability to obtain a text sequence adjustment result; and generating a statement with the maximum probability based on the Chinese corpus as a text sequencing result, and positioning and filling the missing position of the quantifier in the text sequencing result through a quantifier completion model.

Description

technical field [0001] The invention relates to the field of natural language processing, and specifically refers to a method and a system for word order adjustment and quantifier completion of Chinese texts with low resources. Background technique [0002] With the accumulation of big data corpus and the development of machine learning algorithms such as deep learning, natural language processing (NLP) technology is becoming more and more mature, and text error correction (Grammatical ErrorCorrection, CGED) is a classic problem in the field of natural language processing. Its purpose is to automatically correct grammatical errors in the text, improve language correctness and reduce manual verification costs. For example, when deaf-mute people express Chinese, they are used to writing Chinese texts with sign language grammar, which causes problems such as word order confusion (compared to Chinese) and lack of quantifiers. [0003] According to the Chinese grammatical system...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/166G06F40/284G06F40/211G06F40/237G06F40/253
CPCG06F40/166G06F40/284G06F40/211G06F40/237G06F40/253Y02D10/00
Inventor 陈益强龙广玉邢云冰
Owner INST OF COMPUTING TECH CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products