Label-free specific speaker speech synthesis method and device

A speech synthesis and speaker technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of time-consuming collection, inconvenient and efficient, and expensive speech annotation, and achieve the effect of reducing manpower

Pending Publication Date: 2021-06-22
JIANGSU JINLING TECH GRP CORP
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The training of the vocoder model mentioned above only requires high-fidelity speech and does not require any annotation information. However, the training of the mel-spectrogram prediction network requires a certain amount of high-quality data Yes, but voice annotation is expensive and time-consuming to collect, and the efficiency is low, so that it is not convenient and efficient to synthesize the voice of a specific speaker
[0006] Because of this, the end-to-end TTS system's demand for speech annotation limits the application of speech synthesis without annotation resources
[0007] For an end-to-end TTS system, even if there is a large amount of manually labeled sample data, there may be human labeling errors resulting in low data quality, which affects the quality of synthesized speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Label-free specific speaker speech synthesis method and device
  • Label-free specific speaker speech synthesis method and device
  • Label-free specific speaker speech synthesis method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0051] The present invention will be further illustrated below in conjunction with the accompanying drawings and specific embodiments. This embodiment is implemented on the premise of the technical solution of the present invention. It should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention.

[0052] Such as figure 1 As shown, the embodiment of the present invention provides an annotation-free speaker-specific speech synthesis method, including:

[0053] Step S1: Get the text to be processed. The text to be processed is the voice content of a specific speaker to be synthesized, and the voice content can be Chinese words, phrases, sentences or paragraphs.

[0054] Step S2: Extract the phoneme posterior probability feature corresponding to the text to be processed through the phoneme posterior probability prediction network. Phoneme posterior probability feature (PPGs) in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a label-free specific speaker speech synthesis method and device. The method comprises the steps of obtaining a to-be-processed text; extracting phoneme posterior probability features corresponding to the to-be-processed text through a phoneme posterior probability prediction network; extracting a melt-spectrogram feature corresponding to the phoneme posterior probability feature through a melt-spectrogram prediction network; and synthesizing the voice of the target speaker corresponding to the melt-spectrogram features through a vocoder model. Text annotation information of the target voice is not needed, so that annotation-free voice synthesis of the target voice is realized; a bridge between a text and melt-spectrogram is established based on phoneme posterior probability features, and data adopted for training are open-source annotated voice data and unannotated voice data of a small number of target speakers, so that manpower, time and capital costs are greatly reduced.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a method and device for speech synthesis of a specific speaker without marking. Background technique [0002] Speech synthesis technology converts input text information into audible sound information. The higher the fidelity of the synthesized sound, the more popular it is. [0003] With the continuous breakthrough of voice technology in the industry and the improvement of people's cognitive level and needs in recent years, people have raised more and more challenges to speech synthesis technology. For example, users want the synthesized voice to sound like a specific speaker. voice, and easily add multiple other speaker-specific pronunciations. [0004] In recent years, the end-to-end TTS system has achieved the effect comparable to the human voice, and has become the mainstream speech synthesis system framework. Generally speaking, the end-to-end TTS system can be di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/02G10L25/24G10L19/16G10L25/30
CPCG10L13/02G10L25/24G10L19/16G10L25/30
Inventor 胡俊鑫梁钦段轶刘均伟包静亮
Owner JIANGSU JINLING TECH GRP CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products