Label-free specific speaker speech synthesis method and device

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A speech synthesis and speaker technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of time-consuming collection, inconvenient and efficient, and expensive speech annotation, and achieve the effect of reducing manpower

Pending Publication Date: 2021-06-22

JIANGSU JINLING TECH GRP CORP

View PDF0 Cites 3 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] The training of the vocoder model mentioned above only requires high-fidelity speech and does not require any annotation information. However, the training of the mel-spectrogram prediction network requires a certain amount of high-quality data Yes, but voice annotation is expensive and time-consuming to collect, and the efficiency is low, so that it is not convenient and efficient to synthesize the voice of a specific speaker

[0006] Because of this, the end-to-end TTS system's demand for speech annotation limits the application of speech synthesis without annotation resources

[0007] For an end-to-end TTS system, even if there is a large amount of manually labeled sample data, there may be human labeling errors resulting in low data quality, which affects the quality of synthesized speech

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0051] The present invention will be further illustrated below in conjunction with the accompanying drawings and specific embodiments. This embodiment is implemented on the premise of the technical solution of the present invention. It should be understood that these embodiments are only used to illustrate the present invention and are not intended to limit the scope of the present invention.

[0052] Such as figure 1 As shown, the embodiment of the present invention provides an annotation-free speaker-specific speech synthesis method, including:

[0053] Step S1: Get the text to be processed. The text to be processed is the voice content of a specific speaker to be synthesized, and the voice content can be Chinese words, phrases, sentences or paragraphs.

[0054] Step S2: Extract the phoneme posterior probability feature corresponding to the text to be processed through the phoneme posterior probability prediction network. Phoneme posterior probability feature (PPGs) in the...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a label-free specific speaker speech synthesis method and device. The method comprises the steps of obtaining a to-be-processed text; extracting phoneme posterior probability features corresponding to the to-be-processed text through a phoneme posterior probability prediction network; extracting a melt-spectrogram feature corresponding to the phoneme posterior probability feature through a melt-spectrogram prediction network; and synthesizing the voice of the target speaker corresponding to the melt-spectrogram features through a vocoder model. Text annotation information of the target voice is not needed, so that annotation-free voice synthesis of the target voice is realized; a bridge between a text and melt-spectrogram is established based on phoneme posterior probability features, and data adopted for training are open-source annotated voice data and unannotated voice data of a small number of target speakers, so that manpower, time and capital costs are greatly reduced.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a method and device for speech synthesis of a specific speaker without marking. Background technique [0002] Speech synthesis technology converts input text information into audible sound information. The higher the fidelity of the synthesized sound, the more popular it is. [0003] With the continuous breakthrough of voice technology in the industry and the improvement of people's cognitive level and needs in recent years, people have raised more and more challenges to speech synthesis technology. For example, users want the synthesized voice to sound like a specific speaker. voice, and easily add multiple other speaker-specific pronunciations. [0004] In recent years, the end-to-end TTS system has achieved the effect comparable to the human voice, and has become the mainstream speech synthesis system framework. Generally speaking, the end-to-end TTS system can be di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/02G10L25/24G10L19/16G10L25/30

CPCG10L13/02G10L25/24G10L19/16G10L25/30

Inventor 胡俊鑫梁钦段轶刘均伟包静亮

Owner JIANGSU JINLING TECH GRP CORP

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Label-free specific speaker speech synthesis method and device

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology