Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Fuzzy word segmentation based non-multi-character word error automatic proofreading method

An automatic proofreading, non-multi-word technology, applied in electrical digital data processing, special data processing applications, instruments, etc., can solve the problem of difficulty in Chinese text error checking methods, and achieve high effectiveness and accuracy, accuracy, Responsive effect

Active Publication Date: 2015-10-21
南方电网互联网服务有限公司
View PDF2 Cites 29 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] 2) Proofreading of Chinese texts first requires Chinese word segmentation. If there is a typo in a word, it will be divided into single-character strings during word segmentation—not multi-word errors, which brings difficulties to the error checking method of Chinese texts.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fuzzy word segmentation based non-multi-character word error automatic proofreading method
  • Fuzzy word segmentation based non-multi-character word error automatic proofreading method
  • Fuzzy word segmentation based non-multi-character word error automatic proofreading method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0041] The present invention will be further described below in conjunction with the accompanying drawings and embodiments.

[0042] A kind of non-multi-word error automatic proofreading method based on fuzzy word segmentation provided by the present invention, automatic proofreading based on the method of fuzzy word segmentation, comprises the following steps:

[0043] 1) Using the double-array Tire tree structure established based on the correct word dictionary and the wrong word dictionary, the maximum matching method is used to accurately segment Chinese sentences, and the precise word segmentation is established, and the exact segmentation based on the wrong word dictionary is performed. The results are marked, and the correct words corresponding to the typo words matched by the Chinese sentence and the typo word dictionary are added to the word map. Specifically:

[0044]First, use the dictionary of correct words and the dictionary of wrong words to carry out precise wo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a fuzzy word segmentation based non-multi-character word error automatic proofreading method. According to the method, accurate segmentation is carried out based on a correct word dictionary and a wrong character word dictionary to generate a word graph; then the similarity of Chinese word strings is calculated by utilizing a fuzzy matching algorithm, accurately segmented disperse strings are subjected to fuzzy matching, and a fuzzy matching result is added into the word graph to form a fuzzy word graph; and finally a shortest path of the fuzzy word graph is calculated by utilizing a binary model of words in combination with similarity, so that automatic proofreading of Chinese non-multi-character word errors is realized. According to the fuzzy word segmentation based non-multi-character word error automatic proofreading method provided by the invention, the system response is quick, the precision meets actual application demands, and the effectiveness and the accuracy are high.

Description

technical field [0001] The invention relates to natural language processing in the field of artificial intelligence computers, in particular to the field of automatic proofreading of Chinese texts. Background technique [0002] With the rapid development of information processing technology and the Internet, traditional text work is almost completely replaced by computers. Electronic texts such as e-books, e-newspapers, e-mails, and office documents, blogs, and microblogs have all become part of people's daily lives. However, there are more and more errors in the text, which brings great challenges to the proofreading work. Traditional manual proofreading has low efficiency, high intensity, and long cycle obviously cannot meet the needs of text proofreading. [0003] Automatic text proofreading is one of the main applications of natural language processing, and it is also a difficult problem in natural language understanding. With the development of technology, the automat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 刘亮亮吴健康
Owner 南方电网互联网服务有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products