Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice processing method, device and equipment and storage medium

A voice processing and voice technology, applied in the computer field, can solve the problems of consuming large computer resources and time, lack of training data, and insufficient retraining of voice conversion models, etc., to reduce the occupation and time consumption of computing resources, and lower the application threshold , Improve the effect of voice processing efficiency

Active Publication Date: 2021-04-27
BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
View PDF6 Cites 4 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The present disclosure provides a speech processing method, device, device and storage medium to at least solve the problem in the related art that consumes a lot of computer resources and time caused by retraining the speech conversion model when the target speaker changes. and at least one problem of insufficient training data to retrain the speech conversion model

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice processing method, device and equipment and storage medium
  • Voice processing method, device and equipment and storage medium
  • Voice processing method, device and equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0203] As an optional implementation manner, the reconstruction module includes:

[0204] The reconstruction sub-module is configured to call the vocoder to reconstruct the waveform of the target speech feature to obtain the converted target speech.

[0205] As an optional implementation manner, the speech processing model is obtained through training in the following manner:

[0206] Obtain a training set, the training set includes at least one speech sample pair, each speech sample pair includes a first speech sample and a second speech sample, and the first speech sample and the second speech sample are different utterances of the same speaker;

[0207] Invoking the encoder in the basic speech processing model, respectively encoding the first speech sample and the second speech sample pair in each speech sample, respectively obtaining the first sample features corresponding to the first speech sample pair, and the second speech sample pair The second sample feature corresp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to a voice processing method, device and equipment and a storage medium. The method comprises the steps: obtaining a to-be-processed first voice and a to-be-processed second voice; calling an encoder in a voice processing model obtained by performing optimization training based on at least one target speaker statement to encode the obtained voice, and respectively obtaining a first feature representing text information irrelevant to the identity of the speaker and a second feature representing tone information of the target speaker; and performing decoding and voice reconstruction based on the first feature and the second feature to obtain a target voice after tone conversion. Thus, through an end-to-end voice processing model, the voice processing model does not need a large number of target speaker statements, and the tone modeling ability of the target speaker can be completed only based on a small number of utterances, so that the occupation and time consumption of computing resources for model training are reduced.

Description

technical field [0001] The present disclosure relates to the field of computer technology, in particular to a voice processing method, device, equipment and storage medium. Background technique [0002] Speech conversion refers to the conversion of the original speaker's timbre of speech into the target speaker's timbre while keeping the language content unchanged. Speech conversion plays an important role in video voice change, video dubbing, human-computer interaction and other fields. [0003] In related technologies, existing speech recognition systems are usually trained using a large number of data sets. When the target speaker changes, it is necessary to obtain a large amount of data to retrain a voice conversion model, which not only consumes a lot of computer resources and time, but also in some special scenarios, especially in the voice data of the new target speaker. In rare cases, it is not sufficient to retrain a speech translation model to a new target speake...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L21/013
CPCG10L21/013G10L2021/0135
Inventor 张颖
Owner BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products