Many-to-many speaker conversion method based on Transitive STARGAN

A conversion method and speaker technology, applied in speech analysis, speech recognition, instruments, etc., can solve problems such as network degradation, achieve the effects of accelerating convergence speed, overcoming the loss of semantic features, and improving learning ability

Pending Publication Date: 2020-07-17
NANJING UNIV OF POSTS & TELECOMM
View PDF3 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] Purpose of the invention: the technical problem to be solved in the present invention is to provide a Transitive STARGAN-based many-to-many speaker conversion method and computer storage medium, which solves the network degradation problem of the existing method in the training process, by using the STARGAN generator Multi-layer TransNet is built between the encoding and decoding networks to improve the learning ability of the decoding network for semantic features of different scales, realize the learning function of the model for deep spectral features, improve the quality of spectrum generation of the decoding network, and more fully learn semantic features and speakers. Personalized features, so as to better improve the personality similarity and voice quality of converted and synthesized speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Many-to-many speaker conversion method based on Transitive STARGAN
  • Many-to-many speaker conversion method based on Transitive STARGAN
  • Many-to-many speaker conversion method based on Transitive STARGAN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0038] The transfer network (TransNet) has been used in the image field, which is beneficial to obtain more delicate details and edge features of the converted image. This invention applies the idea of ​​transfer network to the field of voice conversion, and is used in the generator network for different scale The transfer of information features strengthens the learning ability and expressive ability of the generator network. The present invention uses the transfer network to compensate the semantic information lost by the generator in the encoding and decoding stage, so that the model can fully learn the deep features of the spectrum, so that the spectrum with more details can be obtained, and the fuzzy details of the spectrum generated by the generator network can be avoided. Improve the spectral generation quality of the decoding network. This structure further reduces the learning difficulty of the generator network for semantics, thereby improving the naturalness and cla...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a many-to-many speaker conversion method based on Transitive STARGAN. According to the invention, a STARGAN generator is combined with a transmission network, features extracted by a coding network are transmitted to a corresponding network layer of a decoding network in the generator; the learning capability of the decoding network on semantic features of different scalesis improved; the learning function of the model on the frequency spectrum deep features is realized; the frequency spectrum generation quality of the decoding network is improved, semantic features and personalized features of speakers are learned more fully, so that the personality similarity and voice quality of converted and synthesized voice are better improved, the problem that the personality similarity and the naturalness are poor after STARGAN model conversion is solved, and high-quality many-to-many speaker conversion under a non-parallel text condition is realized.

Description

technical field [0001] The invention relates to a multi-to-multiple speaker conversion method, in particular to a multi-to-multiple speaker conversion method based on TransitiveSTARGAN. Background technique [0002] Speech conversion is a research branch in the field of speech signal processing, which is developed and extended on the basis of speech analysis, synthesis and speaker recognition. The goal of voice conversion is to change the voice personality of the source speaker so that it has the voice personality of the target speaker, while retaining semantic information, that is, to make the voice of the source speaker sound like the voice of the target speaker after conversion. . [0003] After years of research on speech conversion technology, many classic conversion methods have emerged. According to the classification of training corpus, they can be divided into conversion methods under parallel text conditions and conversion methods under non-parallel text condition...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G10L15/08G10L15/16G10L15/18G10L15/06
CPCG10L15/08G10L15/16G10L15/1815G10L15/063G10L2015/0631
Inventor 李燕萍何铮韬
Owner NANJING UNIV OF POSTS & TELECOMM
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products