Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice recognition method based on glottis wave information

A speech recognition and gate wave technology, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as inevitable repetition and overfitting, limited contribution to speech recognition accuracy, and inability to fully describe the mechanism and characteristics of vocal cord vibration. Achieve the effect of avoiding the repetition of features between frames and improving the recognition results

Active Publication Date: 2021-04-30
SUZHOU UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The consideration of speech recognition performance improvement in most technical solutions is limited to which feature performs better and which classification algorithm performs better. These solutions use the original speech signal as its source signal for analysis and feature extraction, and cannot eliminate Due to the influence of lip radiation and vocal tract resonance, it is difficult to highlight the important characteristics of glottal excitation generated by vocal cord vibration in the process of vocalization
Some technical solutions use glottal features in speech recognition as supplementary features to improve the feature set to describe the integrity of speech signal characteristics. However, the main body is still mostly cepstrum, nonlinear or measurement disturbance features. The effect of speech recognition performance improvement has not yet been reflected. In summary, the speech recognition method in the prior art has the following defects:
[0004](1) The features extracted by existing speech recognition methods are limited by the accurate estimation of pitch frequency, and the value of glottal wave information in speech recognition cannot be fully utilized , has limited contribution to improving the accuracy of speech recognition
[0005](2) The existing speech recognition method uses the original speech signal after vocal tract resonance and lip radiation as the source signal processing, which fails to eliminate vocal tract resonance and lip radiation The influence of the influence, can not fully describe the mechanism and characteristics of vocal fold vibration
[0006](3) Part of the speech recognition method adopts the feature extraction method with the frame as the analysis unit, which inevitably has the repetition and overfitting of the feature extraction of the frame shift part, which reduces the Robustness and reliability of recognition results

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice recognition method based on glottis wave information
  • Voice recognition method based on glottis wave information
  • Voice recognition method based on glottis wave information

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0036] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments, so that those skilled in the art can better understand the present invention and implement it, but the examples given are not intended to limit the present invention.

[0037] refer to figure 1 As shown, an embodiment of the speech recognition method based on glottal wave information of the present invention includes three steps of source signal preprocessing, feature extraction and classification recognition, and the source signal preprocessing extracts the glottal wave signal of the original speech signal As the source signal of feature extraction; the feature extraction adopts the dynamic picture experts group standard MPEG-7 to extract audio high-order statistical features and openSMILE features and classic glottal features combined as the feature set of glottal wave signal recognition; the classification Recognition Predictive Classification...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a voice recognition method based on the glottis wave information. The method comprises three steps of source signal preprocessing, feature extraction, and classification and recognition, wherein the source signal preprocessing extracts a glottis wave signal of an original voice signal as a source signal for feature extraction; the feature extraction adopts a dynamic image expert group standard MPEG-7 to extract an audio high-order statistic feature, and combines the audio high-order statistic feature with an openSMILE feature and a classic glottis feature to serve as a feature set for glottis wave signal identification; classification and recognition are based on a random forest classifier, and prediction and classification of speech recognition are carried out by adopting a ten-time cross validation method. According to the method, glottis waves are used as source signals, the effects of glottis excitation and vocal cord vibration mechanisms in speech recognition are fully represented, and a dynamic image expert group standard MPEG-7 is proposed to extract audio high-order statistical characteristics and combine the audio high-order statistical characteristics with openSMILE characteristics and classic glottis characteristics to serve as a recognition characteristic set; a problem of inter-frame repetition and over-fitting is solved, and meanwhile, the method does not depend on a fundamental tone frequency estimation result.

Description

technical field [0001] The invention relates to the field of voice recognition, in particular to a voice recognition method based on glottal wave information. Background technique [0002] Speech recognition technology can generally be decomposed into stages such as signal preprocessing, feature extraction, and classification recognition. The feature categories extracted by speech recognition technology can be divided according to their principles: measurement disturbance based on pitch frequency and signal amplitude, cepstrum obtained by spectral cepstrum calculation, nonlinear dynamic analysis and calculation, and Glottal class estimated by inverse filtering algorithm, etc. Among them, the perturbation features and nonlinear features are subject to the estimation accuracy of the pitch frequency, and their performance is not as good as that of the cepstrum features. The glottal features are mostly used as supplementary features, and their value in speech recognition has no...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/02G10L15/08
CPCG10L15/02G10L15/08Y02T10/40
Inventor 陶智伍远博孙宝印张晓俊周长伟范子琦
Owner SUZHOU UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products