Multi-speaker voice separation method based on voiceprint features and generative adversarial learning

A voiceprint feature and speech separation technology, applied in neural learning methods, speech analysis, biological neural network models, etc., can solve the problems of robustness, high complexity of deep models, poor speech separation effect, etc., to improve invariance Effect

Active Publication Date: 2020-05-08
BEIJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 13 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the deep model based on spectral mapping has high complexity and strong modeling ability, but its generalization is heavily dependent on the data set. If the amount of data is insufficient, the learned spectral mapping relationship is not robust enough; in addition, feature selection is usually general feature, the speech separation method based on spectral mapping fails to effectively combine the auditory selection characteristics of the human ear and the voice characteristics of different speakers, and the effect of speech separation is not good

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Multi-speaker voice separation method based on voiceprint features and generative adversarial learning
  • Multi-speaker voice separation method based on voiceprint features and generative adversarial learning
  • Multi-speaker voice separation method based on voiceprint features and generative adversarial learning

Examples

Experimental program
Comparison scheme
Effect test

no. 1 example

[0057] This embodiment proposes a speech separation method based on voiceprint features and generative adversarial learning for multi-speaker speech separation in speech recognition. The multi-speaker mentioned in this embodiment refers to a scene where multiple people speak at the same time, and the speech separation to be performed is to extract the speech of the target speaker. Preferably, the scene where multiple people speak at the same time includes: in the intelligent conference instant inter-interpretation system, removing the voice or background sound of unrelated people; suppressing the voice of the non-target speaker on the device side before transmitting the voice signal, Improve the voice quality and intelligibility of conference communication; and the development of application in smart cities will be in the speaker signal collection in voice interaction in smart home, unmanned driving, security monitoring and other fields.

[0058] figure 1 Shown is a schematic...

no. 2 example

[0099] This embodiment provides a multi-speaker speech separation system based on voiceprint features and generative adversarial learning. Figure 4 Shown is a schematic structural diagram of the multi-speaker speech separation system based on voiceprint features and generative adversarial learning. Such as Figure 4 As shown, the multi-speaker speech separation system includes: an anchor sample collection module, a hybrid preprocessing module, a voiceprint feature extraction module, at least one discriminator, and at least one generator.

[0100] Wherein, the anchor sample collection module is connected with the hybrid preprocessing module and the voiceprint feature extraction module, and is used to use the pure speech of the target speaker (ie, the anchor sample) as a pure training corpus, and provide the pure training corpus to the Hybrid preprocessing module and voiceprint feature extraction module.

[0101] The mixed preprocessing module is connected with the voiceprint...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a multi-speaker voice separation method based on voiceprint features and generative adversarial learning. The method is used for solving the problem that in the prior art, voiceseparation is not accurate and pure enough. The multi-speaker voice separation method comprises the following steps: mixing audio data of a target speaker, other irrelevant speakers and noise to obtain an initial mixed training corpus, extracting voiceprint features from a pure training corpus of the target speaker and a separation result of an initialization generator, and completing training ofa discriminator; after discriminator parameters are solidified, training of the generator is completed; and the parameter solidification generator separates the target speaker voice from the to-be-separated voice through generative adversarial learning. According to the method, the sample similar to the target can be generated by utilizing generative adversarial learning, and the output distribution is continuously approached through the generative adversarial network, so that the distribution difference between the voice data and the real target speaker training data in the multi-speaker interference environment is reduced, and the tracking recognition of the audio of the target speaker is realized.

Description

technical field [0001] The invention belongs to the field of speech recognition, in particular to a multi-speaker speech separation method based on voiceprint features and generation confrontation learning. Background technique [0002] Automatic Speech Recognition (ASR) is to convert the vocabulary content in human speech into computer-readable input, and use computers to recognize human language. As a way of communication between humans and computers, it is regarded as The basic means of future technology interaction. When people speak in different environments, there will be different interferences. To accurately identify the language of the target speaker, it is necessary to separate the collected audio information. Speech separation includes speech enhancement, multi-speaker separation, and reverberation, among which multi-speaker separation is the most common. For example, in an instant inter-interpretation system for intelligent conferences, on the one hand, when th...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L17/00G10L17/02G10L17/04G10L17/06G10L17/18G06N3/04G06N3/08
CPCG10L17/02G10L17/04G10L17/06G10L17/18G06N3/088G06N3/044G06N3/045
Inventor 明悦傅豪
Owner BEIJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products