Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A system and method for training clone timbre and rhythm based on bottle neck features

A feature training and training method technology, applied in speech recognition technology, voice cloning, speech synthesis technology, artificial intelligence-intelligent speech field, can solve the delay that cannot meet the market response, a lot of labor costs, speech synthesis technology service difficulties, etc. problems, to achieve the effect of shortening the production cycle and reducing the number of corpora

Active Publication Date: 2021-08-03
NANJING SILICON INTELLIGENCE TECH CO LTD
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] With the rapid development of the telephone robot business market, the rapid increase in the volume of intelligent voice services has brought great difficulties to customized speech synthesis technology services (TTS). A set of customized speech synthesis technology services (TTS) requires nearly 10,000 For real recording samples, the production cycle from sample collection, data labeling, data preprocessing, model training to service provision is nearly one month, and requires a lot of labor costs. This delay cannot meet the response of the market

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A system and method for training clone timbre and rhythm based on bottle neck features
  • A system and method for training clone timbre and rhythm based on bottle neck features
  • A system and method for training clone timbre and rhythm based on bottle neck features

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0033] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.

[0034] like figure 1 As shown, the present invention provides a system based on Bottle neck feature training clone timbre and rhythm, including:

[0035] (1) Data acquisition module, used to collect speech recognition module (ASR Model) corpus, prosody module (TTTBModel) basic TTB model corpus, multi-speaker acoustic model (Multi-speaker Acoustic Model) corpus, clone corpus (audio of target user) and corresponding text);

[0036] (2) Acoustic feature extraction module, extracting linear predictive coding feature (LPC Feature) and Mel frequency cepstral coefficient (Mfcc) as acoustic feature;

[...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The present invention relates to the technical fields of speech synthesis, speech recognition, and sound cloning. The present invention combines speech synthesis technology, speech recognition technology, and transfer learning technology to provide a sound cloning implementation scheme based on Bottleneck features (language features of audio), including a training system and training methods; use a small number of samples to provide TTS services with high naturalness and similarity, so as to provide TTS services with target user characteristics, and solve the problems of large sample size, long production cycle, and high labor cost of speech synthesis technology services. The training system includes: a data acquisition module, an acoustic feature extraction module, a speech recognition module, a prosody module, a multi-person speech acoustic module, and a speech synthesis module; the present invention also provides a training method based on the above-mentioned system, including preparing training corpus, acoustic feature extraction , training and fine-tuning of each module, and speech synthesis.

Description

technical field [0001] The invention relates to the fields of speech synthesis technology (TTS), speech recognition technology (ASR), and sound cloning technology, and belongs to the field of artificial intelligence-intelligent speech. Background technique [0002] With the rapid development of the telephone robot business market, the rapid increase in the volume of intelligent voice services has brought great difficulties to customized speech synthesis technology services (TTS). A set of customized speech synthesis technology services (TTS) requires nearly 10,000 For real recording samples, the production cycle from sample collection, data labeling, data preprocessing, model training to service provision is nearly one month, and requires a lot of labor costs. This delay cannot meet the market's response. Currently, TTS mainly includes two technical solutions: staged speech synthesis and end-to-end speech synthesis. The purpose of timbre and rhythm cloning is to synthesize ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L13/02G10L15/02G10L15/06G10L15/16G10L25/03G10L25/24G10L25/30G10L25/12
CPCG10L13/02G10L15/02G10L15/063G10L15/16G10L25/03G10L25/12G10L25/24G10L25/30
Inventor 司马华鹏龚雪飞
Owner NANJING SILICON INTELLIGENCE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products