Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Text error correction method based on XLNet-BiGRU

A text error correction, Chinese technology, applied in the field of XLNet-BiGRU text error correction, can solve the problems of inapplicability, time-consuming, etc., and achieve the effect of improving time-consuming

Pending Publication Date: 2022-02-18
JIANGSU FUTURE NETWORKS INNOVATION
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Traditional text error correction mainly adopts rule-based or translation model-based methods. The rule-based method mainly relies on manual definitions of replacement word dictionaries, and can only correct certain types of errors; using translation models for text error correction is currently the mainstream method , and the neural network-based translation model has replaced the statistical-based translation model for error correction. This method treats text error correction as a translation problem from wrong sentences to correct sentences. Although the effect is good and the sentences are fluent, it requires a lot of Training data, and there is a time-consuming problem when using it
In addition, if only spelling errors are corrected, the current sequence labeling method is mainly used, which can quickly correct typos, but it is not suitable for other errors

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text error correction method based on XLNet-BiGRU
  • Text error correction method based on XLNet-BiGRU
  • Text error correction method based on XLNet-BiGRU

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0049] Such as figure 1 Shown, a kind of text error correction method based on XLNet-BiGRU of the present invention comprises following steps:

[0050] S1. Training XLNet (Generalized Autoregressive Pretraining for Language Understanding) Chinese model based on large-scale unlabeled corpus.

[0051] The XLNet model mainly includes Permutation Language Model, Two-Stream Self-Attention and Transformer-XL core components.

[0052] Further, the permutation language model included in the XLNet model aims to randomly shuffle the Chinese characters in the sentences in the text. For the Chinese character x i , the Chinese {x that originally appeared behind it i+1 ,...,x n} can also appear in front of it, assuming that the text sequence of length T is [1,2,...,T] and all permutations are A T , a t is the tth element in the sequence, and aT The previous element of , this modeling process can be expressed as:

[0053]

[0054] where θ is the model parameter with training.

[00...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a text error correction method based on an XLNet-BiGRU, and the method is characterized in that the method comprises the following steps: S1, training an XLNet (Generalized Automation Network for Language Understanding) Chinese model based on a large-scale unlabeled corpus, wherein the XLNet model mainly comprises a permutation language model (Permutation Language Model), a double-flow attention mechanism (Twood-Stream Self-Attention), and a Transform-XL core component, S2, preprocessing and labeling the text error correction corpus data; and S3, on the basis of the XLNet pre-training Chinese model trained in the S1, constructing an XLNet-BiGRU neural network model, wherein the model is mainly composed of a detection network and an error correction network, and the model is trained by using the marked data in the S2. According to the method, the problem that a traditional error correction method based on a translation model is long in time consumption is solved, and text error correction is optimized into a parallel process that error correction is carried out only aiming at error contents by using the XLNet neural network from a string travel of generating correct sentences one by one.

Description

technical field [0001] The invention relates to the fields of artificial intelligence and natural language processing, in particular to an XLNet-BiGRU text error correction method. Background technique [0002] Text error correction is a natural language processing technology that corrects erroneous content in text, specifically including spelling error correction, grammatical error correction, and semantic-pragmatic error correction in specific scenarios. Among them, spelling error correction is characterized by not changing the length of the text, but only one-to-one correction of typos in the text; grammatical error correction and semantic pragmatic error correction need to deal with multiple word errors, few word errors, and word usage in the text. Mistakes, such as mistakes and wrong word order, may change the length of the text. [0003] In recent years, large-scale deep pre-trained language models such as BERT and XLNet have promoted the rapid development of the fiel...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F16/33G06F16/35G06F40/211G06F40/216G06F40/30G06K9/62G06N3/04G06N3/08
CPCG06F16/3344G06F16/3346G06F16/35G06F40/211G06F40/216G06F40/30G06N3/08G06N3/047G06N3/044G06F18/2415G06F18/241
Inventor 王伦张发雨王宁党章吴兴龙孟奥冯立二杨正云
Owner JIANGSU FUTURE NETWORKS INNOVATION
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products