Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

External corpus speech identification method based on deep convolutional neural network

A neural network and deep convolution technology, applied in the field of external corpus speech recognition, can solve the problems of not using the original speech signal features and not being able to obtain the optimal solution, and achieve the goal of improving sentence recognition accuracy, strengthening recognition rate, and high recognition accuracy Effect

Active Publication Date: 2018-12-21
北京和鸿盈科技术有限公司
View PDF5 Cites 20 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the disadvantage is that the two-stage recognition model based on the acoustic model and the language model is a series of two-stage separate training, so that the language model does not use the original speech signal features, and cannot obtain an optimal solution.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • External corpus speech identification method based on deep convolutional neural network
  • External corpus speech identification method based on deep convolutional neural network
  • External corpus speech identification method based on deep convolutional neural network

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0047] The present invention will be further described below in conjunction with accompanying drawing.

[0048] Such as Figure 1-3 As shown, a deep convolutional neural network-based external corpus speech recognition method, its specific implementation includes the following steps:

[0049] Step 1. Obtain voice annotation data and Internet corpus

[0050] The speech annotation data described in 1-1 is the recording data of a paragraph, and the speech annotation data is analyzed by manual extraction to obtain the Chinese character sequence, pinyin sequence and phoneme sequence corresponding to the speech annotation data.

[0051] 1-2 Each Chinese character has pinyin, and one pinyin may correspond to multiple Chinese characters. Specifically: a pinyin is split into initials and finals. In the same way, the consonants and finals are split into phonemes, and multiple phonemes correspond to one consonant and final.

[0052] 1-3 When obtaining the voice annotation data, the f...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an external corpus speech identification method based on the deep convolutional neural network. The method comprises steps that S1, speech annotation data and an Internet corpus are obtained; S2, speech signal data is cleaned through utilizing the average energy of the speech signal data; S3, feature extraction and normalization of the speech annotation data are performed;S4, a neural network model is constructed; and S5, to-be-tested speech data is inputted into the constructed neural network model, and identification text data is outputted after identification is completed. The method is advantaged in that the deep convolutional conditional random field model can be constructed based on the speech signal data, compared with a general deep learning model, fewer markup speech data are required, moreover, the cheap large-scale non-markup Internet corpus database is fully utilized to enhance the overall identification rate of sentences, sentence identification accuracy is improved, two processes are integrated, and the end-to-end speech identification method is achieved.

Description

technical field [0001] The invention relates to the field of speech signal processing, in particular to an external corpus speech recognition method based on a deep convolutional neural network. Background technique [0002] At present, there are two main categories of speech recognition methods: end-to-end speech recognition, and two-stage recognition models based on acoustic models and language models. Among them, end-to-end speech recognition is trained based on large-scale speech annotation data, the input is a speech signal, and the output is the text corresponding to the speech. The advantage of this method is that it is an end-to-end recognition system that does not require human experts to design business rules, making full use of the advantages of large-scale data and the feature learning capabilities of deep models. However, the disadvantage is that it requires huge training data support. Such methods often require tens of thousands of hours of speech data, and di...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G10L15/16G10L15/18
CPCG10L15/16G10L15/18G10L15/1807
Inventor 傅啸张桂军
Owner 北京和鸿盈科技术有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products