Low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A low signal-to-noise ratio, speech enhancement technology, applied in speech analysis, speech recognition, biological neural network model, etc., can solve the problem of noise semantic difference, and achieve the effect of high stability, good feature processing ability and excellent performance

Pending Publication Date: 2022-01-14

UNIV OF ELECTRONICS SCI & TECH OF CHINA

View PDF5 Cites 2 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The technical problem to be solved by the present invention is to provide a more accurate speech enhancement for the problems of introducing noise and semantic differences when the existing speech enhancement model introduced into the UNet architecture directly transfers low-granularity features to high-level due to skip connections method

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0024] In the time domain, a noise signal x(n) can be expressed as:

[0025] x(n)=s(n)+d(n)

[0026] Among them, n is the index of the time frame, s(n) represents the original signal, and d(n) represents the noise signal. It is worth noting that the dimension of the time frame in each sample is not fixed due to the different duration of the input speech. Given a real-valued vector x of size N in the time domain, transform x(n) into the time-frequency domain by the short-time Fourier transform STFT:

[0027]

[0028] in is the complex conjugate of z, α is the time-shift step size, g is an analysis window (usually using Hanning window or Hamming window), l is the length of the original wave, and N is the number of frequency points. In the definition of STFT, the number of time frames is t=l / α. Therefore, the output of STFT is a two-dimensional matrix of size T×F. The input is a noisy magnitude spectrum, and after processing, an enhanced magnitude spectrum is obtained....

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides a low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation. The method comprises the following steps: performing speech feature extraction on an original speech spectrogram to obtain speech information representation; performing multi-stage information distillation processing on the voice information representation to obtain a voice information distillation result after noise component filtering; and performing spectrogram reconstruction on the voice information distillation result. The calibrated information on the information distillation line at each moment in the multi-stage information distillation processing process formed according to an attention mechanism and an information distillation mechanism is used as the input of self-attention information processing sub-modules at the next moment; and through information distillation and recalibration of the N attention information processing sub-modules and N information distillation sub-modules in sequence, the noise component filtering effect is finally achieved. The method can adapt to speech feature extraction in different environments, so that the models can adapt to acoustic features of different noises, and the speech enhancement effect is remarkably improved.

Description

technical field [0001] The invention relates to speech enhancement and speech noise reduction technology. Background technique [0002] Speech enhancement aims to improve the intelligibility and clarity of speech signals by separating speech and noise components. It has a huge impact on industries such as automatic speech recognition technology, hearing aids, and mobile devices, and has received great attention; in recent years, it has Thanks to the advancement of deep learning, research on speech enhancement in the industry has increased significantly, and a large number of methods based on deep learning have achieved effective separation of clean speech and noise. [0003] The multi-layer deep neural network DNN is used to enhance the speech from the nonlinear mapping from the noisy logarithmic power spectrum to the clean speech, which proves the effectiveness of the deep neural network on the speech enhancement task; compared with the DNN-based model Compared with CNN, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L21/0208G10L25/27G10L25/30G10L15/02G10L15/06G06N3/04

CPCG10L21/0208G10L25/27G10L25/30G10L15/02G10L15/06G10L2015/0631G06N3/045

Inventor 蓝天刘峤吴祖峰台文鑫王钆翔李佳佳陈聪冯雨佳康宏博

Owner UNIV OF ELECTRONICS SCI & TECH OF CHINA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Low-signal-to-noise-ratio speech enhancement method based on information distillation and aggregation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology