Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A method for generating typo word knowledge based on Chinese character confusion set

A technology of confusing words and confusing sets, which is applied in digital data processing, special data processing applications, instruments, etc., can solve the problems of monotonous proofreading work, high labor intensity, low efficiency, etc., to ensure comprehensiveness, high effectiveness and Accuracy, the effect of ensuring precision

Active Publication Date: 2018-05-22
苏州定一智能技术有限公司
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

At present, most of them use manual proofreading. The proofreading work is monotonous, labor-intensive, and inefficient. Manual proofreading can no longer meet the needs of text proofreading. Therefore, the study of automatic text proofreading has far-reaching significance for both theory and application.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A method for generating typo word knowledge based on Chinese character confusion set
  • A method for generating typo word knowledge based on Chinese character confusion set
  • A method for generating typo word knowledge based on Chinese character confusion set

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0039] The present invention will be further described below in conjunction with the accompanying drawings.

[0040] Such as figure 1 As shown, a kind of typo word knowledge generation method based on the Chinese character confusion set proposed by the present invention is to generate the confusion word set based on the Chinese character confusion set and the Chinese dictionary, filter and prune the generated confusion words, and finally use statistical knowledge and rules Confusing words are validated to generate typo word knowledge. The method includes the following steps:

[0041] Step 1: Use the Chinese dictionary and the Chinese character confusion set to generate a confusion word set, which is a set of confusion words.

[0042] The confusion set of Chinese characters refers to, for a Chinese character, the collection that forms with the Chinese character that this Chinese character pronunciation is similar or shape is similar, the Chinese character confusion set that a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for generating knowledge of wrongly written words based on a Chinese character confusion set. The method first uses a correct word dictionary and a Chinese character confusion set to generate a confusion word set; prunes the generated confusion word set through corpus and rules to complete preliminary filtering ; Then use the forward maximum matching word segmentation to segment the confused words in the confused word set after the initial filtering, and use statistical knowledge to verify the confused words according to the pre-set typo word judgment rules, and finally generate typo word knowledge. The method of the present invention solves the problems of low efficiency and large labor load of the existing manual proofreading, and automatically proofreads and corrects errors by using the knowledge of typos and words obtained by the method of the present invention, which improves the error correction quality and error correction of the automatic proofreading of Chinese texts speed.

Description

technical field [0001] The invention relates to natural language processing in the field of artificial intelligence computers, in particular to the field of automatic proofreading of Chinese texts, and in particular to a method for generating typographical knowledge based on Chinese character confusion sets. Background technique [0002] With the rapid development of information processing technology and the Internet, traditional text work is almost completely replaced by computers, text electronic publications such as e-books, e-newspapers, emails, and office documents are constantly emerging, and there are more and more errors in texts . At present, most of them use manual proofreading. The proofreading work is monotonous, labor-intensive, and inefficient. Manual proofreading can no longer meet the needs of text proofreading. Therefore, the study of automatic text proofreading has far-reaching significance for both theory and application. [0003] Realizing automatic proo...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27
CPCG06F40/205
Inventor 刘亮亮顾德之吴健康刘海波张再跃张晓如
Owner 苏州定一智能技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products