Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice recognition method based on DBLSTM+CTC acoustic model

An acoustic model and speech recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as overall performance degradation, inability to extract more discriminative features, and limited model fitting capabilities. The effect of anti-noise ability and high recognition rate

Pending Publication Date: 2020-04-14
武汉水象电子科技有限公司
View PDF8 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This strategy of adding noise is not universal, the noise is different in different scenarios, adding noise to augment data is not a universal solution;
[0019] The speech recognition method disclosed in Invention Patent 3 uses a multi-channel convolutional neural network as the acoustic model. The same voice data enters the same three-channel convolutional network respectively, which cannot extract more discriminative features, and at the same time makes the network structure more Complex, requires a large amount of training data, and is prone to overfitting;
[0020] The speech recognition technology disclosed in Invention Patent 4 is based on a simple DCNN network model and outputs speech sequences end-to-end. Since it uses a CNN-based structure, it has limited processing capacity for data with strong temporal characteristics such as speech; at the same time , the entire model has only 9 layers. For speech recognition with a large Chinese vocabulary, the model fitting ability is limited;
This modeling method involves three models, and the three models are interdependent. The shortcomings of any model will restrain other models, resulting in a sharp drop in overall performance.
The model combines syllables and acoustic features to determine whether the speech is the text, which cannot substantially improve the recognition accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice recognition method based on DBLSTM+CTC acoustic model
  • Voice recognition method based on DBLSTM+CTC acoustic model
  • Voice recognition method based on DBLSTM+CTC acoustic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0043] Such as figure 1 Shown, a kind of speech recognition method based on DBLSTM+CTC acoustic model provided by the present invention, described method comprises

[0044] Step 1, obtaining a real-time speech signal, performing feature extraction on the speech signal, and obtaining a frame-by-frame acoustic feature sequence;

[0045] Step 2, using the acoustic feature sequence as the input of the DBLSTM+CTC acoustic model, and outputting the phoneme sequenc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a voice recognition method based on a DBLSTM+CTC acoustic model, and the method comprises the steps: 1, obtaining a real-time voice signal, carrying out the feature extractionof the voice signal, and obtaining a frame-by-frame acoustic feature sequence; 2, taking the acoustic characteristic sequence as the input of a DBLSTM+CTC acoustic model, and outputting a phoneme sequence; and 3, establishing a decoding model for converting the phoneme sequence into the character sequence, taking the phoneme sequence as the input of a decoding model, and outputting the character sequence through the decoding model. The method is a voice recognition method based on two-stage end-to-end (seq2seq). The method comprises an end-to-end model of a 'voice-phoneme sequence' and an end-to-end model of a 'phoneme sequence-character sequence', the two models are different from the end-to-end model of an existing 'voice-character sequence', the two models do not need super-large-scalecorpus training, the advantages of the two parts can be complementary, and a language model can make up for the deficiency of an acoustic model in a noise environment.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to a speech recognition method based on a DBLSTM+CTC acoustic model. Background technique [0002] Speech is the most common and effective way of human interaction, and it has always been an important part of the research field of human-computer communication and human-computer interaction. Human-computer voice interaction technology, which is composed of speech synthesis, speech recognition and natural language understanding, is recognized as a difficult and challenging technical field in the world. At the same time, speech recognition technology can enter various industries such as industrial production, electronic communication, automotive electronics, medical care, service education, etc., and will lead the information technology revolution to a new level. [0003] Speech recognition, also known as automatic speech recognition (Automatic Speech Recognition, ASR). Automatic sp...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/02G10L15/05G10L15/06G10L15/18
CPCG10L15/02G10L15/063G10L15/05G10L15/18G10L2015/025
Inventor 袁熹柳慧芬
Owner 武汉水象电子科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products