End-to-end speech synthesis network based on embedded system

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An embedded system and speech synthesis technology, applied in speech synthesis, speech analysis, instruments, etc., can solve the problem that the reasoning speed is not real-time, and achieve the effect of increasing the reasoning speed and reducing the calculation amount of parameters and models

Pending Publication Date: 2021-11-26

DALIAN UNIV OF TECH

View PDF1 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, the splicing method used in this invention is difficult to obtain high quality, compared with the neural network method widely used at present, and it is not real-time in terms of inference speed.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0017] The specific implementation manners of the present invention will be further described below in conjunction with the accompanying drawings and technical solutions.

[0018] combine figure 1 , to synthesize speech, mainly including the following steps:

[0019] Step 1: Convert the text into a mel-spectrogram through the codec structure.

[0020] Step 2: Input a continuous sequence, first go through K 1-D convolutions in fastspeech, these convolution kernels can effectively model the current and context information. Convolutional inputs are stacked together, max-pooled along the time axis to increase invariance to current information, and then input to several fixed-bandwidth 1-D convolutions that add outputs to the starting input sequence. All convolutions use Batch Normalization. Enter a multi-layer highway network to extract higher-level features. Finally, a bidirectional GRU is added at the top to extract the contextual features of the sequence. The spectrogram i...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention belongs to the technical field of embedded computers, and provides an end-to-end speech synthesis network based on an embedded system, which is characterized in that characters are coded and decoded to generate a Mel spectrogram, and then a picture is converted into a speech file through a vocoder. And meanwhile, under the condition of ensuring that the voice quality is not obviously attenuated, the reasoning speed is improved, real-time performance is realized, and finally, the network is deployed on an embedded platform. By adopting the end-to-end network and using the newest neural network method, the reasoning speed can be greatly improved under the condition of reducing parameters and model calculation amount, and the effect of rhythm adjustability is achieved by feeding forward a rhythm encoder. Characters pass through a front-end encoding and decoding part to generate a Mel spectrogram, and then the Mel spectrogram is converted into the speech file through the vocoder. The method is an end-to-end method. According to the method, the audio can be efficiently synthesized in real time, so that the method is deployed on a subway embedded platform.

Description

technical field [0001] The invention belongs to the technical field of embedded computers, and relates to an end-to-end model speech synthesis vocoder based on an embedded system. Background technique [0002] With more and more voice interactions with machines, speech synthesis technology is being used more and more in reality, such as AI synthesized anchors and subway broadcasts. However, the synthesis quality or small changes in speech can have a big impact on customer experience and customer preference. Therefore, high-quality real-time speech synthesis remains a challenging task. [0003] Currently, advanced speech synthesis models include statistical parameter neural network speech synthesis models and end-to-end speech synthesis models. Text-to-speech synthesis is usually divided into two parts. The first step is to convert the text into time-aligned features, such as mel-spectrograms. The second model is to convert these time-aligned features into audio samples. ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G10L13/04G10L13/047G10L25/24G10L25/30G10L19/16

CPCG10L13/04G10L13/047G10L25/24G10L25/30G10L19/16

Inventor 李相

Owner DALIAN UNIV OF TECH

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

End-to-end speech synthesis network based on embedded system

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology