Single channel mixed speech time domain separation method based on Convolutional Neural Network

A convolutional neural network and mixed voice technology, applied in voice analysis, instruments, etc., can solve problems such as difficult phase recovery, separation quality to be improved, and mutual interference

Inactive Publication Date: 2017-06-13
DALIAN UNIV OF TECH
View PDF6 Cites 40 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the current neural network-based methods generally use a fully connected network (Full Connected Neural Network, FCNN) or a recurrent neural network (Recurrent Neural Network, RNN), and usually need to extract the amplitude spectrum features of the speech signal, which has not been well utilized. To the powerful feature expression ability of the convolutional neural network itself; at the same time, due to the use of the amplitude spectrum feature, it is faced with the difficult phase recovery problem when restoring the source signal
Therefore, the traditional neural network-based separation method has mutual interference between the separated two source signal estimates, and the separation quality needs to be improved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Single channel mixed speech time domain separation method based on Convolutional Neural Network
  • Single channel mixed speech time domain separation method based on Convolutional Neural Network
  • Single channel mixed speech time domain separation method based on Convolutional Neural Network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] The present invention will be further described below in conjunction with the drawings.

[0049] Such as figure 1 As shown, the time-domain separation method for single-channel mixed speech based on convolutional neural network includes the following steps:

[0050] Step 1. Establish a voice data set for training. Randomly select a large amount of voice data from a standard database, such as TSP voice database, and divide it into two groups. 80% of the voice data is used as training data and the remaining 20% ​​is used as test data. ;

[0051] Step 2. Preprocess the voice data. First, use formula (1) to normalize the original voice data to the range [-1,1].

[0052]

[0053] Where s i Represents the i-th source signal, max(·) represents the maximum value, abs(s i ) Means pair s i Each element in takes the absolute value, y i Represents the normalized i-th source signal, and then uses formula (2) to process the time domain speech signal into frames. The frame length is N=1024, a...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a single channel mixed speech time domain separation method which is the single channel mixed speech time domain separation method based on a Convolutional Neural Network. The method comprises the following steps of 1, constructing a speech data set for training, 2, preprocessing speech data, 3, obtaining hybrid speech data, 4, constructing a neural network structure, 5, using data organized to train a neural network in a monitoring mode, and 6, using the trained neural network to carry out a separation test. According to the single channel mixed speech time domain separation method based on the Convolutional Neural Network, time domain speech signals serve as input and output of the Convolutional Neural Network, single channel hybrid speeches are separated, and therefore estimation for two source signals is obtained. The method does not need to deal with the problem of phase retrieval, and the separation quality of a single channel speech is improved.

Description

Technical field [0001] The invention relates to a time-domain separation method for single-channel mixed speech, and more specifically, to a time-domain separation method for single-channel mixed speech based on a convolutional neural network. Background technique [0002] Single-channel blind source separation (Monaural Blind Source Separation, MBSS) is an important technology in the field of speech processing. It can obtain estimates of two-channel source signals when only a single-channel mixed speech signal is obtained. Single-channel speech separation technology has important application value in speech recognition, speech enhancement, speech identification and other fields. [0003] Typical single-channel speech separation includes methods based on non-negative matrix factorization (NMF) and neural networks (Neural Network). Since the single-channel mixed speech contains less information, it is difficult to achieve satisfactory separation results based on non-negative matrix...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L21/0272G10L21/0224G10L25/30
CPCG10L21/0224G10L21/0272G10L25/30
Inventor 张鹏马晓红
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products