Improved voice reinforcement method based on multi-target criterion learning

A speech enhancement and multi-objective technology, which is applied in speech analysis, speech recognition, instruments, etc., can solve the problems of poor training target effect and achieve the effect of easy implementation, SNR improvement, and optimization of the target function

Inactive Publication Date: 2019-07-26
TIANJIN UNIV
View PDF2 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

It is found that the training objective based on floating value masking is better than that based on binary masking in terms of enhancing the quality and intelligibility of speech, while the training objective based on spectral envelope is the least effective

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Improved voice reinforcement method based on multi-target criterion learning
  • Improved voice reinforcement method based on multi-target criterion learning
  • Improved voice reinforcement method based on multi-target criterion learning

Examples

Experimental program
Comparison scheme
Effect test

specific example

[0047] (1) Signal preprocessing

[0048] 1. Select data:

[0049] Select 600 sentences from the TIMIT standard corpus as training pure speech, and the sampling frequency is 16KHz; select Factory, F16, White and Pink four kinds of noise from the Noisex-92 standard noise library as training noises, and the pure speech and noise are respectively mixed SNR -5dB, -2dB, 0dB and 2dB are mixed to get the training data set.

[0050] Select 120 sentences from the remaining sentences of TIMIT as the test set of pure speech, and the sampling rate is still 16KHz; select Factory from the Noisex-92 standard noise library as the test noise, and use -5dB, -2dB, 0dB and 2dB mixed signal-to-noise Ratio is mixed with pure speech to obtain a test data set.

[0051] 2. Framing and windowing

[0052] When the speech signal is divided into frames, the frame length is 320 points, the frame shift is 160 points, and the window function is Hamming window.

[0053] (2) Calculate the logarithmic power ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to an improved voice reinforcement method based on multi-target criterion learning. The method comprises the steps of per-processing a signal, wherein a training data set and a testing data set are acquired and subjected to framing and windowing, and a window function type, framing duration and frame shift parameters are determined; calculating a logarithm power spectrum of each frame of signal of noisy voices in the training data set and the testing data set after framing and windowing; calculating a target function of multi-target training; training a deep neural network; testing the network, wherein the logarithm power spectrum of each frame of signal of the noisy voices of the testing data set is used as a characteristic and input to the deep neural network for testing the neural network; taking voice intelligibility, subjective voice quality evaluation and voice quality as evaluation indexes for the intelligibility, sensing effect and voice quality after voice enhancement respectively. According to the method, the adverse influence of phase information of the signals of the noisy voices on the intelligibility and voice quality of the enhanced voice is eliminated, and the method is convenient and easy to implement.

Description

technical field [0001] The invention relates to a voice enhancement method. In particular, it concerns an improved multi-objective criterion learning method for speech enhancement. Background technique [0002] Speech enhancement refers to the technology of extracting useful speech signals (that is, pure speech) from the noise background as much as possible when the speech signal is interfered or even submerged by various noises, while suppressing and reducing noise interference. In recent years, a variety of speech enhancement methods have been proposed, mainly including methods based on signal processing, methods based on statistical models, and methods based on model training. Among these methods, based on signal processing, spectral subtraction and Wiener filtering are the two most representative algorithms. When the background noise is correctly estimated, this type of method can achieve better speech enhancement performance. However, in In the real environment, espec...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L25/60G10L25/21G10L25/24G10L25/30G10L25/45G10L21/02G10L15/06
CPCG10L15/063G10L21/0202G10L25/21G10L25/24G10L25/30G10L25/45G10L25/60
Inventor 张涛邵洋洋
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products