A Chinese typo correction method and system based on word segmentation enhancement

A correction method and word segmentation technology, applied in neural learning methods, semantic analysis, electrical digital data processing, etc., can solve the problem that word segmentation tools cannot predict the correct word segmentation results, and achieve the effect of ensuring correctness

Active Publication Date: 2022-07-15
长沙市智为信息技术有限公司
View PDF1 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Therefore, the technical problem to be solved in the present invention is to overcome the problem that the word segmentation tool cannot predict the correct word segmentation result according to the wrong text, thereby providing a Chinese spelling check method and system based on word segmentation enhancement

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Chinese typo correction method and system based on word segmentation enhancement
  • A Chinese typo correction method and system based on word segmentation enhancement
  • A Chinese typo correction method and system based on word segmentation enhancement

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0053] like figure 1 As shown, the flowchart of the method for correcting Chinese typos based on word segmentation enhancement provided by the disclosed embodiments of the present invention includes:

[0054] S1. Obtain the original text containing Chinese typo;

[0055]S2, utilize the first text encoding module in the word segmentation module to obtain the first hidden state of the original text, and predict the word segmentation result of the target text according to the first hidden state of the original text;

[0056] The word segmentation module includes a first text encoding module and a word segmentation network module, and the first text encoding module includes a first embedding layer and an encoder;

[0057] Obtain the character sequence, segment sequence and position sequence corresponding to the original text according to the original text;

[0058] Calculate the first embedding vector by using the first embedding layer according to the character sequence, segmen...

Embodiment 2

[0080] like figure 2 As shown, another method for correcting Chinese typos based on word segmentation enhancement provided by the disclosed embodiments of the present invention includes:

[0081] S1. Obtain the original text containing Chinese typo;

[0082] Wherein, the original text is , , n is the length of the original text, are characters in the original text, i∈{1,2,…,n}.

[0083] S2, utilize the first text encoding module in the word segmentation module to obtain the first hidden state of the original text, and predict the word segmentation result of the target text according to the first hidden state of the original text;

[0084] The word segmentation module includes a first text encoding module and a word segmentation network module; the first text encoding module is a BERT module, which includes a first embedding layer and an encoder; the encoder is a BERT model.

[0085] According to the input requirements of the first embedding layer of the BERT module, t...

Embodiment 3

[0144] refer to image 3 As shown, this exemplary embodiment also provides a word segmentation enhancement based Chinese typo correction system 100, which includes a word segmentation module 110 and a correction module 120; the word segmentation module 110 predicts the word segmentation result of the target text according to the original text; the correction The module 120 corrects the original text according to the word segmentation result, and outputs the target text.

[0145] In the embodiment of this example, the word segmentation module 110 includes:

[0146] The first text encoding module 111, which includes a first embedding layer and an encoder; the first embedding layer is used to obtain a first text embedding vector; the encoder is used to obtain the first hidden state of the original text according to the text embedding vector;

[0147] The word segmentation network module 112 is used to predict the word segmentation result of the target text through the fully conn...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a Chinese typo correction method and system based on word segmentation enhancement. The Chinese typo correction method includes the following steps: S1, obtaining an original text containing Chinese typos; S2, using a first text encoding module in a word segmentation module to obtain the first text of the original text. a hidden state, and predict the word segmentation result of the target text according to the first hidden state of the original text; S3, utilize the text encoding model pre-trained in the correction module, obtain the final result according to the original text, the word segmentation result and the first hidden state Hidden state; S4. Correct the original text according to the final hidden state by using the correction network module in the correction module to obtain the target text. The method can predict the word segmentation result of the target text, and can obtain the correct word segmentation result in the case of typos in the original text, provide effective information for the correction process, and ensure the correctness of the target text.

Description

technical field [0001] The invention relates to the technical field of computer word processing, in particular to a method and system for correcting Chinese typos based on word segmentation enhancement. Background technique [0002] When there are typos in Chinese text, it will affect the semantic expression of the text, and the appearance of typos will cause certain adverse effects in many scenarios. Chinese spell checking is widely used in search optimization, press release proofreading, text correction for non-native language learners, and is an important task in natural language processing. [0003] Early Chinese spell checking methods followed error discovery, candidate recall, and candidate sorting, and corrected errors through language models, word lists, and a large number of artificially designed rules, which were unsatisfactory in accuracy. With the development of deep learning, especially the development and wide application of pre-trained language models, Chines...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/232G06F40/30G06F40/289G06N3/04G06N3/08
CPCG06F40/232G06F40/30G06F40/289G06N3/08G06N3/042G06N3/047G06N3/048G06N3/045
Inventor 李芳芳单悠然黄惟康占英王青
Owner 长沙市智为信息技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products