Text normalization method and system based on WFST

A text and rule technology, applied in the Internet field, can solve problems such as the inability to pronounce non-standard words

Active Publication Date: 2018-09-14
BEIJING UNISOUND INFORMATION TECH
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] A non-standard word may correspond to different pronunciations in different contexts. For example, "11" can stand alone as "11", and it can be

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Text normalization method and system based on WFST
  • Text normalization method and system based on WFST

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The preferred embodiments of the present invention will be described below in conjunction with the accompanying drawings. It should be understood that the preferred embodiments described here are only used to illustrate and explain the present invention, and are not intended to limit the present invention.

[0037] see figure 1 , the WFST-based text regularization method provided by this application, including the following steps:

[0038] S1: Classify non-Chinese characters in advance according to weighted finite state converters, and write corresponding conversion rules for the classified non-Chinese characters.

[0039] S2: Identify non-Chinese character strings from the target Chinese text based on the weighted finite state converter.

[0040] S3: According to the category to which the identified non-Chinese character string belongs, invoke the matching target conversion rule, and based on the target conversion rule, transcribe the identified non-Chinese character ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a text normalization method and system based on WFST. The method comprises the following steps: classifying non-Chinese characters according to a weighting finite state converter in advance, and writing corresponding conversion rules for the classified non-Chinese characters; identifying non-Chinese character strings from a target Chinese text according to the weighting finite state converter; calling matched target conversion rules according to the types of the identified non-Chinese character strings, and transferring the identified non-Chinese characters into Chinesecharacters according to the target conversion rules. The technical scheme can improve accuracy rate of transferring the non-Chinese characters into the Chinese characters.

Description

technical field [0001] The invention relates to the technical field of the Internet, in particular to a WFST-based text regularization method and system. Background technique [0002] The real text contains a large number of non-standard words, which cannot be found in dictionaries, and their pronunciation cannot be obtained through normal pinyin rules. In Chinese text, non-standard words refer to words containing non-Chinese characters, and the non-Chinese characters need to be converted into corresponding Chinese characters. This conversion process is called text regularization. Text regularization is a key step in speech synthesis and a necessary step in speech recognition. Since non-standard words are often the focus of users' attention, such as dates, prices, phone numbers, organization names, etc., text regularization directly affects the quality of voice services. [0003] A non-standard word may correspond to different pronunciations in different contexts. For exam...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F17/22G06F17/30G06K9/62
CPCG06F40/151G06F18/214
Inventor 鲁俊
Owner BEIJING UNISOUND INFORMATION TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products