Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for training acoustic model based on CTC (Connectionist Temporal Classification)

An acoustic model and model technology, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of CTC acoustic model performance is not as good as CE model, model training is unstable, performance degradation of small and medium datasets, etc., to improve independence and performance. The effect of identification, reducing the number of search paths, and easy parallel computing

Active Publication Date: 2018-07-10
INST OF ACOUSTICS CHINESE ACAD OF SCI +1
View PDF4 Cites 18 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0013] During the training process of the CTC model, all paths that can be mapped to the correct text sequence will be included in the forward and backward search process, including some extremely asymmetric paths, that is, the positions where those phonemes appear are compared with the actual situation There are severely delayed or advanced paths, and these paths will lead to unstable model training
In addition, the traditional CTC model architecture uses RNN for training. RNN has long-term modeling capabilities, which can greatly improve the performance of CTC models. However, due to some characteristics of RNN, it is not easy to train in parallel, the training speed is very slow, and the training efficiency is low.
[0014] Although the training steps of the CTC model are simplified, its recognition accuracy is not competitive with the CE model; the recognition accuracy is slightly lower than the traditional cross-entropy (Cross-entropy, CE) method, and the recognition accuracy is lower than that of the CE model. Low; especially on small and medium-sized data sets, the performance degradation is more serious, and the performance of the CTC acoustic model is usually not as good as the CE model
In addition, the training of the CTC model is extremely unstable and prone to divergence

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for training acoustic model based on CTC (Connectionist Temporal Classification)
  • Method for training acoustic model based on CTC (Connectionist Temporal Classification)
  • Method for training acoustic model based on CTC (Connectionist Temporal Classification)

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention will be further described in detail below in conjunction with the accompanying drawings.

[0030] Such as figure 2 As shown, the present invention provides a kind of method based on the acoustic model training of CTC, this method adopts at first a plurality of independent " blank " symbols to replace all phonemes in the original CTC model to share a " blank " symbol, then to the training data The phoneme labeling sequence is aligned with the time points through an initial model GMM to obtain the approximate location of each phoneme, and then construct a search path graph for the forward and backward calculation of CTC for the phoneme labeling sequence after adding the "blank" symbol; then through a configurable The parameter "Time Tolerance" controls phonemes to appear slightly earlier or later in the search path within the "Time Tolerance" range, which is the range of time each element occurs, usually set to 50- 300 milliseconds. In this embodi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a method for training an acoustic model based on CTC (Connectionist Temporal Classification). The method comprises the steps of 1, training an initial GMM (Gaussian Mixture Model), wherein time point forced alignment on text annotation of training data by using the GMM to obtain a time region corresponding to each phoneme; 2, inserting a blank symbol associated with the phoneme behind each phoneme, wherein each phoneme has a unique blank symbol; 3, constructing a CTC forward and backward calculated search path diagram for a phoneme annotation sequence with the blank symbols being added by adopting a finite state machine; 4, restricting the appearance time range of each phoneme according to a time alignment result, pruning the search path diagram, and cutting off thepath with the phoneme position exceeding the time restrictions so as to obtain a final search path diagram required by calculating a network error in CTC; and 5, performing acoustic model training byadopting the combination of a time-delay neural network (TDNN) structure and the CTC method to obtain a final TDNN-CTC acoustic model.

Description

technical field [0001] The invention relates to the technical field of speech recognition, in particular to a method for training an acoustic model based on CTC. Background technique [0002] In recent years, the introduction of Deep Neural Network (DNN) for acoustic model modeling in speech recognition systems has achieved great success. Due to the excellent classification ability of DNN, it can replace the Gaussian Mixture Model (GMM) in the traditional Hidden Markov Model architecture to generate posterior probability. However, this new HMM / DNN model architecture is very complicated to train. Therefore, researchers began to explore an end-to-end learning method, that is, input a sequence of speech features and directly obtain its text sequence. In this case, the method of combining Connectionist Temporal Classification (CTC) with Recurrent Neural Network (RNN) has attracted more and more attention from researchers. [0003] There are two main differences between CTC an...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/14G10L15/16G10L15/02
CPCG10L15/02G10L15/144G10L15/16G10L2015/025
Inventor 张鹏远王智超潘接林颜永红
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products