Text normalizing method and device, electronic equipment and storage medium

A regular and text-based technology, applied in the field of natural language processing, can solve problems such as low efficiency and poor accuracy

Pending Publication Date: 2020-10-27
UNIV OF SCI & TECH OF CHINA +1
View PDF3 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Embodiments of the present invention provide a text regularization method, device, electronic equipment, and s

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text normalizing method and device, electronic equipment and storage medium
  • Text normalizing method and device, electronic equipment and storage medium
  • Text normalizing method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0045] In order to make the purpose, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments It is a part of embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

[0046] With the rapid development of Internet technology, people are exposed to massive amounts of text information every day. However, there are usually a lot of noise in these text information, such as words without substantial meaning, repeated descriptions with similar semantics, especially the text obtained through speech recognition, in whic...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a text regularization method and device, electronic equipment and a storage medium. The method comprises the steps that a to-be-regularized text is determined;inputting the to-be-structured text into the text structured model to obtain a structured text corresponding to the to-be-structured text output by the text structured model; wherein the text regularization model is obtained by training based on a to-be-regularized sample text, a regularized sample text and a sample editing type of each segmented word in the to-be-regularized sample text; whereinthe text regularization model is used for determining an editing type of each segmented word in the to-be-regularized text, determining a regularization mode of the to-be-regularized text based on whether the to-be-regularized text contains insertion segmented words of which the editing types are insertion types or not, and regularizing the to-be-regularized text based on the regularization mode.According to the method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention, the text normalization accuracy and the text normalization efficiencyare improved.

Description

technical field [0001] The present invention relates to the technical field of natural language processing, in particular to a text regularization method, device, electronic equipment and storage medium. Background technique [0002] Text regularization refers to deleting substantive words in the original text or repetitive descriptions with similar semantics, words irrelevant to the topic or colloquial words in the original text while keeping the semantics of the text basically unchanged, as well as adjusting the word order in the original text to make it regular The latter text is more written, neat and concise. [0003] Current text regularization methods are usually based on an encoder-decoder model (Encoder-Decoder) to convert text into regular text. However, the above encoder-decoder model has the problem of difficult model learning during the training process, resulting in poor regularization effect, and the model has a large amount of calculation and low efficiency....

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/103G06F40/253G06F40/289G06F40/30
CPCG06F40/103G06F40/253G06F40/289G06F40/30
Inventor 戚婷万根顺高建清王智国胡国平
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products