Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speaker recognition method based on Gaussian mixture model embedded with time delay neural network

A Gaussian mixture model and speaker recognition technology, applied in the field of speaker recognition

Inactive Publication Date: 2011-04-27
戴红霞 +2
View PDF0 Cites 34 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, at present, GMM and TDNN are only used for speaker recognition alone, and there is no method that combines the respective advantages of the two to better improve the effect of speaker recognition.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speaker recognition method based on Gaussian mixture model embedded with time delay neural network
  • Speaker recognition method based on Gaussian mixture model embedded with time delay neural network
  • Speaker recognition method based on Gaussian mixture model embedded with time delay neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0069] The technical solutions of the present invention will be further described below in conjunction with the drawings and embodiments.

[0070] figure 1 It is a training and recognition model for speaker recognition embedded in TDNN network. It is different from the baseline GMM model (only GMM model is used as speaker recognition) in terms of training and recognition.

[0071] 1. Preprocessing and feature extraction

[0072] First, a method based on energy and zero-crossing rate is used for silence detection, and spectral subtraction is used to remove noise, and then f(Z)=1-0.97Z -1 The filter is pre-emphasized, and the Hamming window with a length of 20ms and a window shift of 10ms is used to divide the frame into a 20th-order linear prediction (LPC) analysis, and then the 13th-order cepstral coefficient is obtained from the 20th-order LPC coefficient for speaker recognition. eigenvectors of .

[0073] 2. Speaker model training

[0074] During training, the process o...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speaker recognition method based on a Gaussian mixture model (GMM) embedded with a time delay neural network (TDNN). In the speaker recognition method, the advantages of the TDNN and the GMM are fully considered, the TDNN is embedded into the GMM, and solves a residual of input and output vectors of the TDNN by fully utilizing the time sequence of an input characteristic vector through the conversion of a time delay network, and the residual modifies the training of the GMM through an expectation maximization method; besides, a likelihood probability is acquired by a modified GMM model parameter and the residual, and a TDNN parameter is modified by an inertial backward inversion method so as to ensure that parameters of the GMM and the TDNN are alternately updated. An experiment shows that: a recognition rate of the method is improved to a certain extent compared with that of a baseline GMM under various signal to noise ratios.

Description

technical field [0001] The invention relates to a speaker recognition method, in particular to a speaker recognition method based on a Gaussian mixture model embedded in a time-delay neural network. Background technique [0002] In access control, credit card transactions and court evidence, automatic speaker recognition, especially text-independent speaker recognition, plays an increasingly important role. Its goal is to correctly determine the speech to be recognized as belonging to the speech library One of many references. [0003] In the method of speaker recognition, the method based on Gaussian Mixture Model (GMM) has been paid more and more attention. Because of its advantages of high recognition rate, simple training, and small requirement for training data, it has become the mainstream recognition method at present. Since the Gaussian mixture model (GMM) has a good ability to represent the distribution of data, as long as there are enough items and enough training...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/00G10L15/06G10L15/28G10L25/24
Inventor 戴红霞王吉林余华魏昕赵力
Owner 戴红霞
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products