Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Binaural speech separation method based on LSTM (Long Short Term Memory) network

A long-short-term memory and speech separation technology, applied in speech analysis, instruments, etc., can solve problems such as performance degradation

Active Publication Date: 2020-01-24
SOUTHEAST UNIV
View PDF4 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0010] Purpose of the invention: Aiming at the problem that the performance of the previous binaural speech separation algorithm drops sharply under the condition of high noise and strong reverberation, the present invention proposes a binaural speech separation method of the long-short-term memory network LSTM, which uses the LSTM network to Feature parameters in multiple environments for training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Binaural speech separation method based on LSTM (Long Short Term Memory) network
  • Binaural speech separation method based on LSTM (Long Short Term Memory) network
  • Binaural speech separation method based on LSTM (Long Short Term Memory) network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0067] Such as figure 1 As shown, the binaural speech separation method based on the LSTM network provided by this embodiment includes the following steps:

[0068] Step 1. Convolve two different monophonic speech signals in the training speech with the head-related impulse response function HRIR of different azimuth angles to generate two training monophonic source binaural speech signals in different azimuths. The source calculation formula is:

[0069] the s 1,L (n)=s 1 (n)*h 1,L the s 2,L (n)=s 2 (n)*h 2,L

[0070] the s 1,R (n)=s 1 (n)*h 1,R ,s 2,R (n)=s 2 (n)*h 2,R

[0071] Among them, s 1 (n), s 2 (n) is two different monophonic speech signals, s 1,L (n), s 1,R (n) represents the single sound source left and right ear speech signals corresponding to the azimuth angle 1, h 1,L 、h 1,R Indicates the left ear HRIR and right ear HRIR corresponding to azimuth 1, s 2,L (n), s 2,R (n) represents the single sound source left and right ear speech signals cor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a binaural speech separation method based on an LSTM (Long Short Term Memory) network. The ITD (Inter-aural Time Difference), the IID (Interaural Intensity Difference) and theCCF (Cross Correlation Function) of each time frequency unit of a training binaural speech signal are extracted to be used as separation space features; space features of a current frame and front andback 5 frames of the time frequency units in the same subband are used as input parameters of a two-way LSTM network to be trained; and a separation model based on the LSTM is obtained. At the test stage, the space features of the current frame and the front and back 5 frames of the time frequency units of a test binaural speech signal are used as input parameters, obtained through training, of the two-way LSTM network, and are used for estimating shielding values of the target speech of the current time frequency unit so as to perform speech separation according to a shielding value. The separation result shows that compared with a method based on a deep neural network, the binaural speech separation method based on the LSTM network provided by the invention has the advantages that the subjective evaluation index is obviously improved, and the algorithm generalization performance is good.

Description

technical field [0001] The invention relates to a speech separation algorithm, in particular to a binaural speech separation method based on a long-short-term memory network LSTM. Background technique [0002] Speech separation algorithm is an important research direction of speech signal processing, and it also has a wide range of applications. For example, in teleconferencing systems, speech separation technology can realize the extraction of interested sound sources from multiple speakers, which can improve the efficiency of teleconferencing; The pre-processing process applied to speech recognition can improve the quality of speech and help improve the accuracy of recognition; when applied to hearing aids, it can provide more prominent target sound sources and effective speech information for the hearing-impaired. [0003] Speech separation technology involves a wide range of fields, including but not limited to acoustics, digital signal processing, information communicat...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/0272G10L25/30
CPCG10L21/0272G10L25/30
Inventor 周琳陆思源钟秋月庄琰
Owner SOUTHEAST UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products