Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Speech speed estimation model training, speech speed estimation method, device, equipment and medium

A technology for estimating models and training methods. It is used in speech analysis, speech recognition, instruments, etc., and can solve problems such as low robustness and inability to predict the true value of speech rate.

Active Publication Date: 2018-03-09
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF7 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] The present invention provides a speech rate estimation model training, speech rate estimation method, device, equipment and medium to solve the problems in the prior art that the speech rate estimation method has low robustness and cannot predict the true value of the speech rate

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Speech speed estimation model training, speech speed estimation method, device, equipment and medium
  • Speech speed estimation model training, speech speed estimation method, device, equipment and medium
  • Speech speed estimation model training, speech speed estimation method, device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0056] figure 1 It is a flow chart of the speech rate estimation model training proposed by the embodiment of the present invention, and its specific process is as follows:

[0057] S101: For each sentence in the preset speech corpus, perform syllable labeling on the sentence according to the preset syllables; divide the sentence into a plurality of first speech segments, according to the number of syllables contained in each first speech segment , to determine the speech rate value of each first speech segment.

[0058] In the embodiments of the present invention, when training the speech rate estimation model, the preset speech corpus used may be a corpus containing speech information in global languages. For example, the preset speech corpus may be the 863 four major dialect Mandarin speech corpora, German Speech Corpus, French Speech Corpus and Acoustic-Phoneme Continuous Speech Corpus (TIMIT Corpus), etc.

[0059] Preferably, the embodiment of the present invention can ...

Embodiment 2

[0085] In order to demonstrate the dynamic change process of speech rate in speech and improve the accuracy of speech rate estimation results, on the basis of the above-mentioned embodiments, in the embodiment of the present invention:

[0086] Described dividing this sentence into a plurality of first speech segments includes:

[0087] The sentence is divided into a plurality of first speech segments with a duration of 1 second, wherein each subsequent first speech segment overlaps with the preceding first speech segment adjacent to it for 0.5 seconds.

[0088] Specifically, the method of dividing the sentences in the preset speech corpus into a plurality of first speech segments is as follows: for each sentence in the preset speech corpus, the duration of each sentence is known, and the time precision is seconds, considering The duration of each statement may be different (for example, the duration of statement a is 10 seconds, and the duration of statement b is 7.8 seconds)...

Embodiment 3

[0102] like Figure 4 As shown, it is a flow chart of the speech rate estimation method proposed in the embodiment of the present invention, and its specific processing process is as follows:

[0103] S401: Divide the sentence to be estimated into multiple second speech segments.

[0104] For each sentence to be estimated, each sentence to be estimated can be divided into a plurality of second speech segments. For specific division, various methods can be used, and the sentence to be estimated can be divided into multiple equal or unequal lengths. The second speech segment, and after the second speech segment of each sentence to be estimated is spliced, the complete sentence can be obtained; in addition, when each second speech segment is determined, every two adjacent speech segments can have overlap and so on.

[0105] Specifically, in the embodiment of the present invention, the method for dividing the sentence into multiple second speech segments includes but is not limi...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses speech speed estimation model training, a speech speed estimation method, device, equipment and a medium, and aims to solve the problems that an existing speech speed estimation method cannot predict the speech speed real value; the method comprises the following steps: marking syllables for each sentence in a preset speech corpus according to preset syllables; dividing thesentence into a plurality of the first speech segments, and determining the speech speed value of each the first speech segment according to the syllable number contained in each first speech segment; dividing each first speech segment into first speech units of a preset number, and extracting the audio features of each first speech unit; using the audio features of each first speech unit in thefirst speech segment and the speech speed value of the speech segment after the first speech segment to train an LSTM model. The method marks the syllables of the sentences in the speech corpus, and determines the real speech speed value, thus enabling the LSTM model to estimate the speech speed real value of a to-be-estimated sentence.

Description

technical field [0001] The present invention relates to the technical field of speech rate prediction, in particular to a speech rate estimation model training, a speech rate estimation method, device, equipment and medium. Background technique [0002] As an important prosodic means of emotional expression, speech rate is the basis of language rhythm. It has broad application prospects in the fields of emotion recognition, speech rate compensation in speech recognition, and evaluation of language fluency in aphasia in medicine. The existing measurement schemes for speech speed can be divided into two categories, one is based on syllable detection algorithms, and the other is based on machine learning algorithms. [0003] Among them, in the algorithm based on syllable detection, the speed of speech is usually measured by the number of syllables per second or the number of phonemes per second. Generally, the energy envelope, frequency domain features or zero-crossing rate of ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/02G10L15/06G10L15/14G10L25/24
CPCG10L15/02G10L15/063G10L15/14G10L25/24G10L2015/025G10L2015/027
Inventor 谢湘肖艳红
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products