Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Tibetan language speech recognition method based on HMM and DNN

A speech recognition and Tibetan language technology, applied in speech recognition, speech analysis, instruments, etc., to achieve the effect of improving efficiency

Pending Publication Date: 2020-09-22
TIANJIN UNIV
View PDF7 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, currently on the market, there is no effective speech recognition system for Tibetan

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Tibetan language speech recognition method based on HMM and DNN
  • Tibetan language speech recognition method based on HMM and DNN
  • Tibetan language speech recognition method based on HMM and DNN

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0043] Based on the HMM-DNN Tibetan speech recognition system, its construction includes the following steps:

[0044] Step 1: Record Tibetan speech data, and label the recorded Tibetan speech data to establish a database.

[0045] Step 2: Perform data preparation, organize several files required for training the model, extract MFCC, and perform cepstral mean variance normalization.

[0046] In a speech recognition system, the first step is feature extraction. Information such as the pitch of a voice can reflect a person's speech characteristics. A person's speech characteristics can be reflected in the shape of the vocal tract. If the shape can be accurately known , then we can accurately describe the generated phonemes. The shape of the vocal tract is displayed in the envelope of the short-term power spectrum of speech. MFCC is a feature that accurately describes this envelope.

[0047] First, pre-emphasize, frame and window the speech; then analyze each short time window, ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the field of artificial intelligence, the invention provides a Tibetan language speech recognition system based on an HMM-DNN (hidden Markov model-deep neural network). According to the Tibetan language speech recognition method based on the HMM and the DNN, a deep learning training model is combined with Tibetan language which is a low-resource corpus, a Tibetan language-based establishment model is trained, Tibetan language speech is recognized, and the human-computer interaction efficiency of Tibetan people is improved, and the Tibetan language speech recognition method based on the HMM and the DNN comprises the following steps of 1, recording Tibetan language speech data; 2, carrying out data preparation; 3, constructing a language model and a pronunciation dictionary; 4, training a single-phoneme model; 5, training a three-tone sub-model; 6, performing linear discriminant analysis and maximum likelihood linear transformation, and performing decoding and alignment; 7, carrying out speaker adaptive training; and 8, carrying out model training. The method is mainly applied to Tibetan language speech automatic recognition occasions.

Description

technical field [0001] The present invention relates to the field of artificial intelligence, in particular to a training method and system for a speech recognition model of Tibetan with low-resource corpus. Background technique [0002] In today's society, artificial intelligence, virtual reality, wearable devices, etc. have become the frontiers and hotspots of technology industry research, and these fields inevitably require human-computer interaction, and speech recognition technology is undoubtedly the most advanced technology in human-computer interaction. The most convenient and direct application method, speech recognition technology is the process of allowing computers to understand human language and convert it into equivalent text. [0003] For a long time, the modeling of acoustic models in the field of speech recognition has used the GMM-HMM model (Gauss-Hidden Markov Model), which has reliable accuracy and has a mature maximum expectation algorithm (EM algorithm...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/00G10L15/06G10L15/14G10L15/16G10L25/24
CPCG10L15/063G10L15/005G10L15/144G10L15/16G10L25/24
Inventor 韩智丞魏建国吕绪康
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products