Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Many-to-many speech conversion system based on vae and i-vector under the condition of non-parallel text

A voice conversion, non-parallel technology, applied in the field of signal processing, can solve the problem that the personality similarity of the converted voice is not ideal.

Active Publication Date: 2021-09-14
NANJING UNIV OF POSTS & TELECOMM
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, since the one-hot feature is only a speaker's identity label and does not carry rich personality information, the personality similarity of the transformed speech obtained by the VAE model based on the one-hot feature is not ideal, which is the main shortcoming of the algorithm. one

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Many-to-many speech conversion system based on vae and i-vector under the condition of non-parallel text
  • Many-to-many speech conversion system based on vae and i-vector under the condition of non-parallel text
  • Many-to-many speech conversion system based on vae and i-vector under the condition of non-parallel text

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0027] see figure 1 and figure 2 , the present embodiment provides a many-to-many speech conversion system based on VAE and i-vector under non-parallel text conditions, which is divided into two steps of training and conversion:

[0028] 1 speaker speech training stage

[0029] 1.1 Obtain the training corpus. The speech library used here is VCC2018, which contains 8 source speakers and 4 target speakers. The training corpus is divided into two groups: 4 male speakers and 4 female speakers. For each fully trained speaker, 81 sentences are used as training corpus for full training, and 35 sentences are used as test corpus for model evaluation;

[0030] 1.2 Use the speech analysis and synthesis model WORLD to extract the speech features of each frame of the speaker's sentence: spectral envelope sp', speech logarithmic fundamental frequency logf 0 , the harmonic spectrum envelope ap, calculate the energy en of each frame of speech, and recalculate the spectrum envelope, ie sp...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention provides a many-to-many speech conversion system based on VAE and identity feature vector (i-vector) under the condition of non-parallel text, and realizes speech based on the variational autoencoding model (Variational Autoencoding, VAE) under the condition of non-parallel corpus Conversion, in which the speaker's representation is added to the speaker's identity feature i‑vector, which can effectively improve the personality similarity of the converted voice. The advantages of the present invention include three aspects: 1) The dependence on parallel text is removed, and the training process does not require any alignment operation; 2) The transformation system of multiple source-target speaker pairs can be integrated in one transformation model, namely Realize many-to-many conversion; 3) The introduction of i‑vector features can enrich speaker identity information, thereby effectively improving the personality similarity of converted speech and improving conversion performance.

Description

technical field [0001] The invention belongs to the technical field of signal processing, and in particular relates to a many-to-many speech conversion system based on VAE and i-vector under the condition of non-parallel texts. Background technique [0002] After years of research on speech conversion technology, many classic conversion methods have emerged, including Gaussian Mixed Model (GMM), frequency bending, deep neural network (DNN), and methods based on unit selection. However, most of these speech conversion methods need to use parallel corpora for training to establish conversion rules between source speech and target speech spectral features. [0003] The speech conversion method based on the variational autoencoder (VAE) model is a system that directly uses the speaker's identity label to establish a speech conversion system. This speech conversion system does not need to analyze the speech frames of the source speaker and the target speaker during model training...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L21/013G10L25/18G10L25/21G10L25/30G10L13/02
CPCG10L13/02G10L21/013G10L25/18G10L25/21G10L25/30G10L2021/0135
Inventor 李燕萍许吉良张燕
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products