Unified Chinese-English mixed text generation and speech recognition end-to-end framework

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A mixed text and speech recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as data mismatch

Active Publication Date: 2021-08-20

INST OF AUTOMATION CHINESE ACAD OF SCI

View PDF11 Cites 10 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

In this way, although the speech recognition model training data can be obtained, the synthetic data does not match the real data. How to use the synthetic data to improve the performance of the recognition system is a challenging problem.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0086] Such as figure 1 The end-to-end framework for unified Chinese-English mixed text generation and speech recognition provided by the embodiment of the present application includes:

[0087] Chinese-English mixed phoneme sequence generation module, speech feature extraction module, acoustic feature sequence convolution downsampling module, acoustic encoder, phoneme embedding module, phoneme encoder, discriminator and decoder; the phoneme encoder and the discriminator Constitute a generation confrontation network, the phoneme coder is used as the generator of the generation confrontation network, the discriminator is the discriminator of the generation confrontation network, and the acoustic encoder is used as the true data input of the generation confrontation network, Using this confrontational generative network to promote the distribution of the phoneme coded representation output by the phoneme encoder close to the acoustic coded representation output by the acoustic c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a universal unified Chinese-English mixed text generation and speech recognition end-to-end framework. The universal unified Chinese-English mixed text generation and speech recognition end-to-end framework comprises an acoustic encoder, a phoneme encoder, a discriminator and a decoder, the phoneme encoder and the discriminator form a generative adversarial network, the phoneme encoder serves as a generator of the generative adversarial network, the discriminator serves as a discriminator of the generative adversarial network, and the acoustic encoder serves as real data input of the generative adversarial network, the generative adversarial network is used for promoting the distribution of phoneme coding representations output by a phoneme encoder to be close to acoustic coding representations output by an acoustic encoder, and the decoder fuses the acoustic coding representations and the phoneme coding representations to obtain decoding representations, and inputs the decoding representation into a softmax function to obtain an output target with the maximum probability.

Description

technical field [0001] This application relates to the field of speech recognition, in particular to an end-to-end framework for unifying Chinese-English mixed text generation and speech recognition. Background technique [0002] The Chinese-English mixed phenomenon refers to the inclusion of both Chinese and English expressions in the speaking process, mainly including two types of inter-sentence conversion and intra-sentence conversion. Among them, the phenomenon of intra-sentence conversion has brought great challenges to speech recognition technology. The main problems are accent problems caused by non-standard pronunciation of speakers; more and more complex modeling units; collaborative pronunciation of different languages; difficulties in data collection; difficulties in data labeling, etc. With the development of deep learning technology, monolingual speech recognition technology has been greatly improved. Especially for the end-to-end speech recognition model, its...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L15/06G10L15/02G10L15/183G10L15/26

CPCG10L15/02G10L15/063G10L15/183G10L15/26G10L2015/025

Inventor 陶建华张帅易江燕

Owner INST OF AUTOMATION CHINESE ACAD OF SCI

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Unified Chinese-English mixed text generation and speech recognition end-to-end framework

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology