Speech synthesis method, device and equipment and storage medium

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of speech synthesis and synthetic speech, which is applied in speech synthesis, speech analysis, instruments, etc., can solve the problems of single speech quality and low speech quality, and achieve the effect of high quality and close speaking style

Pending Publication Date: 2021-04-30

IFLYTEK CO LTD

View PDF8 Cites 13 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

People hope to synthesize these cross-lingual sentences with a consistent and natural voice, but most current end-to-end models assume that the input is a single language and only use the original text as input to the synthesis model

[0003] The inventors of this case found that the pronunciation phenomena of different languages are different, such as Chinese tone patterns, Japanese accents, and Russian accents, etc. are not expressed on the text, so the existing synthesis model for a single language only uses the original Text is used as model input, and for speech synthesis of cross-lingual sentences, the quality of the synthesized speech is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

preparation example Construction

[0062] Next, combine figure 1 Described, the speech synthesis method of the present application may comprise the following steps:

[0063] Step S100: Obtain the original text, the phoneme sequence corresponding to the original text, and the speaker characteristics of the speech to be synthesized.

[0064] Specifically, before speech synthesis, it is necessary to obtain the original text to be subjected to speech synthesis. The original text may be text information in a single language, or may be text information in multiple languages, for example, the original text may be text information including two or more languages at the same time.

[0065] Further, considering the different pronunciation characteristics of different languages, the pronunciation characteristics of some languages may not be displayed in the form of text, for example, Chinese tone patterns, Japanese tone cores, Russian accents, etc. cannot be displayed in the form of word faces. , but can be displaye...

Embodiment approach

[0092] In an optional implementation manner, the specific implementation process of the above step S120 may include the following steps:

[0093] S1. Perform encoding processing on the fusion feature to obtain an encoded feature.

[0094] Specifically, the fusion feature can be encoded by the text encoder to obtain the encoded feature output by the text encoder.

[0095] Further, considering that the existing end-to-end speech synthesis models all assume that the input is in a single language, the result is that when different languages are mixed in the input text, the existing models often synthesize wrong speech, or even skip it directly. word. At the same time, since it is difficult to obtain the speech of the same speaker in different languages, in order to prevent the model from erroneously learning the correlation between speaker characteristics and languages, resulting in the phenomenon of switching speakers in the synthesized speech, this embodiment provides a metho...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a speech synthesis method and device, equipment and a storage medium, and the method comprises the steps: obtaining an original text, a phoneme sequence corresponding to the original text, and the speaker features of to-be-synthesized speech, carrying out the feature fusion of the original text and the phoneme sequence, and obtaining a fusion feature; and performing encoding and decoding processing based on the fusion feature and the speaker features to obtain an acoustic spectrum, and performing speech synthesis based on the acoustic spectrum to obtain a synthesized speech. The fusion feature is obtained by fusing the original text and the phoneme sequence, input information is enriched, specific pronunciation information of different languages can be mined, for example, tone types of Chinese, tone nucleuses of Japanese, accent of Russian and the like can be displayed through the phoneme sequence, the acoustic spectrum is obtained, speech synthesis is carried out, the obtained synthesized speech is more natural, and accords with the pronunciation characteristics of the corresponding language, and the quality of the synthesized speech is higher.

Description

technical field [0001] The present application relates to the technical field of speech signal processing, and more particularly, to a speech synthesis method, apparatus, device and storage medium. Background technique [0002] In recent years, end-to-end speech synthesis systems have been able to achieve good results and can generate synthetic speech close to human in real time. With the development of globalization, in important scenarios of speech synthesis applications such as social media, informal information, and voice navigation, the language phenomenon of mixing different languages in text or speech becomes more and more obvious. One wants to synthesize these cross-lingual sentences with a consistent and natural voice, but most current end-to-end models assume that the input is monolingual and use only raw text as the input to the synthesis model. [0003] The inventor of the present case found that different languages have different pronunciation phenomena, su...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/02G10L17/02G10L17/04G10L19/00G10L19/02

CPCG10L13/02G10L17/02G10L17/04G10L19/0018G10L19/02

Inventor 陈梦楠江源高丽祖漪清

Owner IFLYTEK CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Speech synthesis method, device and equipment and storage medium

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

preparation example Construction

Embodiment approach

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology