Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation

A low signal-to-noise ratio, speech enhancement technology, applied in speech analysis, speech recognition, biological neural network model, etc., can solve the problem of noise semantic difference, and achieve the effect of high stability, good feature processing ability and excellent performance

Pending Publication Date: 2022-01-14
UNIV OF ELECTRONICS SCI & TECH OF CHINA
View PDF5 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The technical problem to be solved by the present invention is to provide a more accurate speech enhancement for the problems of introducing noise and semantic differences when the existing speech enhancement model introduced into the UNet architecture directly transfers low-granularity features to high-level due to skip connections method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation
  • Low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation
  • Low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0024] In the time domain, a noise signal x(n) can be expressed as:

[0025] x(n)=s(n)+d(n)

[0026] Among them, n is the index of the time frame, s(n) represents the original signal, and d(n) represents the noise signal. It is worth noting that the dimension of the time frame in each sample is not fixed due to the different duration of the input speech. Given a real-valued vector x of size N in the time domain, transform x(n) into the time-frequency domain by the short-time Fourier transform STFT:

[0027]

[0028] in is the complex conjugate of z, α is the time-shift step size, g is an analysis window (usually using Hanning window or Hamming window), l is the length of the original wave, and N is the number of frequency points. In the definition of STFT, the number of time frames is t=l / α. Therefore, the output of STFT is a two-dimensional matrix of size T×F. The input is a noisy magnitude spectrum, and after processing, an enhanced magnitude spectrum is obtained....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation. The method comprises the following steps: performing speech feature extraction on an original speech spectrogram to obtain speech information representation; performing multi-stage information distillation processing on the voice information representation to obtain a voice information distillation result after noise component filtering; and performing spectrogram reconstruction on the voice information distillation result. The calibrated information on the information distillation line at each moment in the multi-stage information distillation processing process formed according to an attention mechanism and an information distillation mechanism is used as the input of self-attention information processing sub-modules at the next moment; and through information distillation and recalibration of the N attention information processing sub-modules and N information distillation sub-modules in sequence, the noise component filtering effect is finally achieved. The method can adapt to speech feature extraction in different environments, so that the models can adapt to acoustic features of different noises, and the speech enhancement effect is remarkably improved.

Description

technical field [0001] The invention relates to speech enhancement and speech noise reduction technology. Background technique [0002] Speech enhancement aims to improve the intelligibility and clarity of speech signals by separating speech and noise components. It has a huge impact on industries such as automatic speech recognition technology, hearing aids, and mobile devices, and has received great attention; in recent years, it has Thanks to the advancement of deep learning, research on speech enhancement in the industry has increased significantly, and a large number of methods based on deep learning have achieved effective separation of clean speech and noise. [0003] The multi-layer deep neural network DNN is used to enhance the speech from the nonlinear mapping from the noisy logarithmic power spectrum to the clean speech, which proves the effectiveness of the deep neural network on the speech enhancement task; compared with the DNN-based model Compared with CNN, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L21/0208G10L25/27G10L25/30G10L15/02G10L15/06G06N3/04
CPCG10L21/0208G10L25/27G10L25/30G10L15/02G10L15/06G10L2015/0631G06N3/045
Inventor 蓝天刘峤吴祖峰台文鑫王钆翔李佳佳陈聪冯雨佳康宏博
Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products