Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Voice decoding method based on mixed network

A technology of confusing network and speech decoding, applied in the field of speech decoding based on confusing network, to reduce the workload, reduce the network, and improve the decoding rate.

Active Publication Date: 2006-05-17
INST OF ACOUSTICS CHINESE ACAD OF SCI +1
View PDF0 Cites 16 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0008] The purpose of the present invention is: to overcome the deficiencies of the prior art, in the late stage of multi-pass decoding, without using more information (that is, without using more sophisticated and complex acoustic models and language models), by confusing network clustering technology Reduce the decoding error rate and increase the decoding rate, thus providing a speech decoding method based on confusion network

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Voice decoding method based on mixed network
  • Voice decoding method based on mixed network
  • Voice decoding method based on mixed network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0035] The present invention will be further described below in conjunction with the accompanying drawings and preferred embodiments.

[0036] Such as image 3 As shown, the speech decoding method based on confusion network provided by the present invention comprises the following steps:

[0037] Step 101: Extract feature vector sequences from the input speech signal.

[0038] Step 102: Use the Viterbi-Beam search algorithm to decode the speech features for the first time, output the N-Best sentence or word lattice, and simultaneously obtain the acoustic layer probability score and language layer probability of each word in the N-Best sentence or word lattice Score.

[0039] Step 103: If the intermediate result of output in step 102 is NBest sentence, then it is compressed into directed network structure with merging algorithm, the flow process of this merging algorithm is as follows Figure 4 As shown, it is a prior art, so it will not be described in detail here. If the ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A method for decoding voice based on confusion network includes carrying out deep priority frame synchronous Viterbi ¿C Beam search on voice character and outputting N ¿C Best sentence, generating confusion network by carrying out two stage cluster for N¿C best sentence according to time and phoneme similarity algorithm, matching and searching out optimum result on confusion network by using posterior probability maximum as criterion.

Description

technical field [0001] The invention belongs to the field of speech recognition, and in particular relates to a speech decoding method based on a confusion network. Background technique [0002] The decoding process, also known as the recognition process, is an important part of the speech recognition system. Its function is: under the given acoustic model and language model, for the input acoustic feature vector sequence, automatically search for the optimal matching word string from a certain search space, and finally convert the speech signal into text information . [0003] figure 1 It is a structural diagram of a known speech recognition system. As shown in the figure, the feature extraction module processes the input speech signal in frames, usually with a frame length of 20ms and a frame shift of 10ms; commonly used features include MFCC features, LPC features and PLP features. After feature extraction, the speech signal is transformed into a sequence of feature v...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G10L15/26G10L15/02G10L15/08
Inventor 吕萍颜永红潘接林韩疆
Owner INST OF ACOUSTICS CHINESE ACAD OF SCI
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products