Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

End-to-end speech synthesis network based on embedded system

An embedded system and speech synthesis technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problem that the reasoning speed is not real-time, and achieve the effect of increasing the reasoning speed and reducing the calculation amount of parameters and models

Pending Publication Date: 2021-11-26
DALIAN UNIV OF TECH
View PDF1 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the splicing method used in this invention is difficult to obtain high quality, compared with the neural network method widely used at present, and it is not real-time in terms of inference speed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • End-to-end speech synthesis network based on embedded system
  • End-to-end speech synthesis network based on embedded system
  • End-to-end speech synthesis network based on embedded system

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0017] The specific implementation manners of the present invention will be further described below in conjunction with the accompanying drawings and technical solutions.

[0018] combine figure 1 , to synthesize speech, mainly including the following steps:

[0019] Step 1: Convert the text into a mel-spectrogram through the codec structure.

[0020] Step 2: Input a continuous sequence, first go through K 1-D convolutions in fastspeech, these convolution kernels can effectively model the current and context information. Convolutional inputs are stacked together, max-pooled along the time axis to increase invariance to current information, and then input to several fixed-bandwidth 1-D convolutions that add outputs to the starting input sequence. All convolutions use Batch Normalization. Enter a multi-layer highway network to extract higher-level features. Finally, a bidirectional GRU is added at the top to extract the contextual features of the sequence. The spectrogram i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the technical field of embedded computers, and provides an end-to-end speech synthesis network based on an embedded system, which is characterized in that characters are coded and decoded to generate a Mel spectrogram, and then a picture is converted into a speech file through a vocoder. And meanwhile, under the condition of ensuring that the voice quality is not obviously attenuated, the reasoning speed is improved, real-time performance is realized, and finally, the network is deployed on an embedded platform. By adopting the end-to-end network and using the newest neural network method, the reasoning speed can be greatly improved under the condition of reducing parameters and model calculation amount, and the effect of rhythm adjustability is achieved by feeding forward a rhythm encoder. Characters pass through a front-end encoding and decoding part to generate a Mel spectrogram, and then the Mel spectrogram is converted into the speech file through the vocoder. The method is an end-to-end method. According to the method, the audio can be efficiently synthesized in real time, so that the method is deployed on a subway embedded platform.

Description

technical field [0001] The invention belongs to the technical field of embedded computers, and relates to an end-to-end model speech synthesis vocoder based on an embedded system. Background technique [0002] With more and more voice interactions with machines, speech synthesis technology is being used more and more in reality, such as AI synthesized anchors and subway broadcasts. However, the synthesis quality or small changes in speech can have a big impact on customer experience and customer preference. Therefore, high-quality real-time speech synthesis remains a challenging task. [0003] Currently, advanced speech synthesis models include statistical parameter neural network speech synthesis models and end-to-end speech synthesis models. Text-to-speech synthesis is usually divided into two parts. The first step is to convert the text into time-aligned features, such as mel-spectrograms. The second model is to convert these time-aligned features into audio samples. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L13/04G10L13/047G10L25/24G10L25/30G10L19/16
CPCG10L13/04G10L13/047G10L25/24G10L25/30G10L19/16
Inventor 李相
Owner DALIAN UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products