Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech recognition method based on CLDNN+CTC acoustic model

An acoustic model and speech recognition technology, applied in speech recognition, speech analysis, instruments, etc., can solve the problems of overall performance degradation, inability to extract more discriminative features, limited model fitting ability, etc., and achieve high recognition rate, With anti-noise ability, easy to train the effect

Pending Publication Date: 2020-04-14
武汉水象电子科技有限公司
View PDF11 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This strategy of adding noise is not universal, the noise is different in different scenarios, adding noise to augment data is not a universal solution;
[0019] The speech recognition method disclosed in Invention Patent 3 uses a multi-channel convolutional neural network as the acoustic model. The same voice data enters the same three-channel convolutional network respectively, which cannot extract more discriminative features, and at the same time makes the network structure more Complex, requires a large amount of training data, and is prone to overfitting;
[0020] The speech recognition technology disclosed in Invention Patent 4 is based on a simple DCNN network model and outputs speech sequences end-to-end. Since it uses a CNN-based structure, it has limited processing capacity for data with strong temporal characteristics such as speech; at the same time , the entire model has only 9 layers. For speech recognition with a large Chinese vocabulary, the model fitting ability is limited;
This modeling method involves three models, and the three models are interdependent. The shortcomings of any model will restrain other models, resulting in a sharp drop in overall performance.
The model combines syllables and acoustic features to determine whether the speech is the text, which cannot substantially improve the recognition accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech recognition method based on CLDNN+CTC acoustic model
  • Speech recognition method based on CLDNN+CTC acoustic model
  • Speech recognition method based on CLDNN+CTC acoustic model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the present invention, not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0043] Such as figure 1 Shown, a kind of speech recognition method based on CLDNN+CTC acoustic model provided by the present invention, described method comprises

[0044] Step 1, obtaining a real-time speech signal, performing feature extraction on the speech signal, and obtaining a frame-by-frame acoustic feature sequence;

[0045] Step 2, using the acoustic feature sequence as the input of the CLDNN+CTC acoustic model, and outputting the phoneme sequence;...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a voice recognition method based on a CLDNN+CTC acoustic model, and the method comprises the steps: 1, obtaining a real-time voice signal, carrying out the feature extraction of the voice signal, and obtaining a frame-by-frame acoustic feature sequence; 2, taking the acoustic characteristic sequence as the input of a CLDNN+CTC acoustic model, and outputting a phoneme sequence; and 3, establishing a decoding model for converting the phoneme sequence into the character sequence, taking the phoneme sequence as the input of the decoding model, and outputting the character sequence through the decoding model. The method is a voice recognition method based on two-stage end-to-end (seq2seq), and comprises an end-to-end model of a 'voice-phoneme sequence' and an end-to-endmodel of a 'phoneme sequence-character sequence', the two models are different from the end-to-end model of an existing 'voice-character sequence', the two models do not need super-large-scale corpustraining, the advantages of the two parts can be complementary, and a language model can make up for the deficiency of an acoustic model in a noise environment.

Description

technical field [0001] The invention relates to the field of speech recognition, in particular to a speech recognition method based on a CLDNN+CTC acoustic model. Background technique [0002] Speech is the most common and effective way of human interaction, and it has always been an important part of the research field of human-computer communication and human-computer interaction. Human-computer voice interaction technology, which is composed of speech synthesis, speech recognition and natural language understanding, is recognized as a difficult and challenging technical field in the world. At the same time, speech recognition technology can enter various industries such as industrial production, electronic communication, automotive electronics, medical care, service education, etc., and will lead the information technology revolution to a new level. [0003] Speech recognition, also known as automatic speech recognition (Automatic Speech Recognition, ASR). Automatic spe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/02G10L25/78
CPCG10L15/02G10L25/78G10L2015/025
Inventor 柳慧芬袁熹
Owner 武汉水象电子科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products