A sound conversion system, method and application

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology for transforming systems and sounds, applied in speech analysis, speech recognition, instruments, etc., can solve the problems of inapplicable computing resources and equipment, sudden increase in computing volume, inflexibility, etc., to alleviate inaccurate pronunciation, shorten training time, The effect of improving flexibility

Active Publication Date: 2021-02-12

NANJING SILICON INTELLIGENCE TECH CO LTD

View PDF6 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

However, this type of algorithm still has some defects. For example, the classic method of using Gaussian mixture model for speech conversion is mostly based on one-to-one speech conversion tasks, requiring the source speaker and the target speaker to use the same training sentence content. Only by aligning the spectral features with Dynamic Time Warping (DTW) frame by frame can the mapping relationship between spectral features be obtained through model training. Such a voice conversion method is not flexible enough in practical applications; the Gaussian mixture model is used to train the mapping function When considering global variables and iterating the training data, the amount of calculation will increase sharply, and only when the training data is sufficient, the Gaussian mixture model can achieve a better conversion effect, which is not suitable for limited computing resources and equipment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0046] A sound transformation system,

[0047] include:

[0048] (1) The speaker-independent speech recognition (AI-ASR) model adopts a five-layer DNN structure, of which the fourth layer uses the Bottleneck layer to transform the Mel cepstrum feature (MFCC) of the source speech into the source speech bottleneck feature(Bottleneck Feature);

[0049] The ASR model converts speech into text, and the model outputs the probability of each word corresponding to the audio, and PPG is the carrier of this probability. The PPG-based method uses PPG as the output of the SI-ASR model.

[0050] PPG is Phonetic PosteriorGram, which is a matrix that maps each audio time frame to the posterior probability of a certain phoneme category. To a certain extent, PPG can represent the rhythm and prosody information of a speech content, and at the same time, it removes the features related to the speaker's timbre, so it is independent of the speaker. PPG is defined as follows:

[0051] P_t=(p(s...

Embodiment 2

[0065] Introduce a sound transformation system training method, including the following three parts A1-A3:

[0066] A1, SI-ASR model (speaker-independent speech recognition model) training phase. This stage is trained to obtain the SI-ASR model used in the training stage of the Attention voice-changing network (attention voice-changing network) and the extraction of Bottleneck features (literally translated as bottleneck features, also referred to as BN features) in the voice conversion stage; the training of this model includes The training corpus of many speakers is trained. After training, it can be used for any source speaker, that is, it is speaker-independent (Speaker-Independent, SI), so it is called the SI-ASR model; after training, it can be used directly later without repetition train.

[0067] The SI-ASR model (Speaker Independent Speech Recognition model) training phase consists of the following steps (see attached figure 1 ):

[0068] B1. Preprocessing the mul...

Embodiment 3

[0096] Embodiment 3, a sound conversion method.

[0097] Perform sound transformation on the input source speech, and transform it into a target speech signal output, that is, the speech conforms to the characteristics of the target speaker's voice, but the speech content is the same as the source speech.

[0098] The sound conversion phase consists of the following steps (see appendix Figure 4 ):

[0099] E1, the source speech to be converted is carried out parameter extraction, obtains MFCC characteristic;

[0100] E2, use the SI-ASR model trained in B3 to transform the MFCC feature into a BN feature;

[0101] E3. Use the Attention voice-changing network trained in C5 to transform the BN feature into an acoustic feature (mel spectrum);

[0102] E4. Use the neural network vocoder trained in D4 to convert the acoustic features (mel spectrum) into speech output.

[0103] In this way, the trained speaker-independent speech recognition model can be used for any source speake...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention proposes a non-parallel corpus training voice conversion scheme, gets rid of the dependence on parallel texts, and solves the technical problem that it is difficult to realize voice conversion under the condition of limited resources and equipment, including a voice conversion system, method and corresponding Compared with the existing technology, the terminal has the following advantages: the trained speaker-independent speech recognition model can be used for any source speaker, that is, the speaker is independent; the audio bottleneck feature is more abstract than the speech posterior probability feature, It can not only reflect the content of the speech and decouple it from the speaker's timbre, but at the same time, it is not so closely bound to the phoneme category, and it is not a clear one-to-one correspondence. To a certain extent, it alleviates the problem of inaccurate pronunciation caused by ASR recognition errors. The audio obtained by using the bottleneck feature for sound transformation has a significantly higher pronunciation accuracy than the speech posterior probability method, and there is no significant difference in timbre; using the transfer learning method, the dependence on the training corpus can be greatly reduced.

Description

technical field [0001] The present invention relates to the field of speech calculation algorithms, in particular to a sound transformation system, method and applied terminal. Background technique [0002] With the continuous development of computer technology and the continuous deepening of the field of artificial intelligence, voice robots for the purpose of voice interaction have gradually entered the public eye. The emergence of voice robots has changed the working nature of existing telephone services. At present, voice robots are used in real estate, education, finance, tourism and other industries to perform voice interaction functions, thereby replacing manual voice interactions with users. [0003] In order to optimize customer experience, using voice conversion technology to change the voice characteristics of voice robots is one of the important improvement directions. [0004] Speech conversion technology is a research branch of speech signal processing. It cov...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G10L15/06G10L15/02G10L19/16G10L25/24G10L25/30

CPCG10L15/063G10L15/02G10L19/173G10L25/24G10L25/30G10L2015/025G10L21/003G10L2021/0135G10L15/16

Inventor 司马华鹏毛志强龚雪飞

Owner NANJING SILICON INTELLIGENCE TECH CO LTD

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A sound conversion system, method and application

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology