Speech speed estimation model training, speech speed estimation method, device, equipment and medium
A technology for estimating models and training methods. It is used in speech analysis, speech recognition, instruments, etc., and can solve problems such as low robustness and inability to predict the true value of speech rate.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0056] figure 1 It is a flow chart of the speech rate estimation model training proposed by the embodiment of the present invention, and its specific process is as follows:
[0057] S101: For each sentence in the preset speech corpus, perform syllable labeling on the sentence according to the preset syllables; divide the sentence into a plurality of first speech segments, according to the number of syllables contained in each first speech segment , to determine the speech rate value of each first speech segment.
[0058] In the embodiments of the present invention, when training the speech rate estimation model, the preset speech corpus used may be a corpus containing speech information in global languages. For example, the preset speech corpus may be the 863 four major dialect Mandarin speech corpora, German Speech Corpus, French Speech Corpus and Acoustic-Phoneme Continuous Speech Corpus (TIMIT Corpus), etc.
[0059] Preferably, the embodiment of the present invention can ...
Embodiment 2
[0085] In order to demonstrate the dynamic change process of speech rate in speech and improve the accuracy of speech rate estimation results, on the basis of the above-mentioned embodiments, in the embodiment of the present invention:
[0086] Described dividing this sentence into a plurality of first speech segments includes:
[0087] The sentence is divided into a plurality of first speech segments with a duration of 1 second, wherein each subsequent first speech segment overlaps with the preceding first speech segment adjacent to it for 0.5 seconds.
[0088] Specifically, the method of dividing the sentences in the preset speech corpus into a plurality of first speech segments is as follows: for each sentence in the preset speech corpus, the duration of each sentence is known, and the time precision is seconds, considering The duration of each statement may be different (for example, the duration of statement a is 10 seconds, and the duration of statement b is 7.8 seconds)...
Embodiment 3
[0102] like Figure 4 As shown, it is a flow chart of the speech rate estimation method proposed in the embodiment of the present invention, and its specific processing process is as follows:
[0103] S401: Divide the sentence to be estimated into multiple second speech segments.
[0104] For each sentence to be estimated, each sentence to be estimated can be divided into a plurality of second speech segments. For specific division, various methods can be used, and the sentence to be estimated can be divided into multiple equal or unequal lengths. The second speech segment, and after the second speech segment of each sentence to be estimated is spliced, the complete sentence can be obtained; in addition, when each second speech segment is determined, every two adjacent speech segments can have overlap and so on.
[0105] Specifically, in the embodiment of the present invention, the method for dividing the sentence into multiple second speech segments includes but is not limi...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com