Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech synthesis method and device, equipment and storage medium

A technology of speech synthesis and speech, which is applied in the field of computer equipment, storage media, devices, and speech synthesis methods, can solve problems such as poor user experience and low fitting degree, and achieve the effect of improving user experience

Pending Publication Date: 2021-09-03
PING AN TECH (SHENZHEN) CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The technical problem to be solved by the present invention is that the speech synthesized by the current speech synthesis technology has a low degree of fitting to the real human voice, and the user experience is poor.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech synthesis method and device, equipment and storage medium
  • Speech synthesis method and device, equipment and storage medium
  • Speech synthesis method and device, equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0031] see figure 1 , figure 1 It is a schematic flowchart of a speech synthesis method disclosed in an embodiment of the present invention. Such as figure 1 As shown, the speech synthesis method may include the following operations:

[0032] 101. Input the reference speech sequence into a preset speech prosody analysis model for analysis to obtain speech prosody feature information.

[0033] In the above step 101, the reference speech sequence may be the speech to which the speech that the user wants to synthesize refers to. For example, if the user wants to make the synthesized voice more suitable for the voice of human A, he can convert a real voice of human A speaking into a reference voice sequence. The prosody of speech includes the intensity, pitch, duration, and pitch of the speech, and the prosody of the speech of different speakers usually has certain differences. The speech prosody analysis model analyzes the reference speech sequence, and the speech prosody fe...

Embodiment 2

[0066] see figure 2 , figure 2 It is a structural schematic diagram of a speech synthesis device disclosed in an embodiment of the present invention. Such as figure 2 As shown, the speech synthesis device may include:

[0067] The speech prosody analysis module 201 is used for inputting the reference speech sequence to a preset speech prosody analysis model for analysis to obtain speech prosody feature information;

[0068] The text prosody analysis module 202 is used for inputting the target text sequence into a preset text prosody analysis model for analysis to obtain text prosody feature information;

[0069] A merge processing module 203, configured to perform preset merge processing on the speech prosody feature information and the text prosody feature information, to obtain prosody information for recording the prosody of the target speech to be synthesized;

[0070] A speech synthesis module 204, configured to synthesize the target speech based on the target text...

Embodiment 3

[0086] see image 3 , image 3 It is a schematic structural diagram of a computer device disclosed in an embodiment of the present invention. Such as image 3 As shown, the computer equipment may include:

[0087] A memory 301 storing executable program codes;

[0088] A processor 302 connected to the memory 301;

[0089] The processor 302 invokes the executable program code stored in the memory 301 to execute the steps in the speech synthesis method disclosed in Embodiment 1 of the present invention.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a speech synthesis method. The method comprises the following steps: inputting a reference speech sequence into a preset speech rhythm analysis model for analysis to obtain speech rhythm feature information; inputting the target text sequence into a preset text rhythm analysis model for analysis to obtain text rhythm feature information; performing preset merging processing on the speech rhythm feature information and the text rhythm feature information to obtain rhythm information used for recording the rhythm of a target speech to be synthesized; and synthesizing the target speech based on the target text sequence and the rhythm information. Therefore, themethod can combine the speech rhythm of the referencespeech and the text rhythm of the target text to synthesize the speech when speech synthesis is performed so that the synthesized speech is closer to the real voice of human beings, and the user experience is improved. The invention also relates to the technical field of block chains.

Description

technical field [0001] The invention relates to the technical field of speech synthesis, in particular to a speech synthesis method, device, computer equipment and storage medium. Background technique [0002] With the development of computer technology, speech synthesis technology has developed into a mature technology, which is widely used in real life, such as intelligent customer service, mobile phone voice assistant, map navigation and so on. However, what follows is that users have higher and higher expectations for speech synthesis technology. At present, users are mainly concerned about whether the synthesized voice is close enough to the real human voice, and whether it sounds natural and realistic enough. Traditional speech synthesis technology mainly focuses on how to convert text sequences into speech sequences, and pays less attention to whether the rhythm of the converted speech sequences is appropriate. Due to the lack of control over the rhythm of synthesiz...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L13/10
CPCG10L13/10
Inventor 张旭龙王健宗
Owner PING AN TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products