Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method, device and electronic equipment for generating a speech recognition model

A speech recognition model and speech frame technology, applied in speech recognition, speech analysis, instruments, etc., can solve problems such as parallel computing difficulties, error accumulation, and large computing resource consumption, so as to improve accuracy and recognition effect, and alleviate error accumulation Effect

Active Publication Date: 2022-04-08
BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the current end-to-end framework consumes a lot of computing resources, and parallel computing is difficult. When speech recognition is performed through the speech recognition neural network model, there will be output errors at the previous moment, resulting in error accumulation. The recognition accuracy of the model is low. less effective

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method, device and electronic equipment for generating a speech recognition model
  • Method, device and electronic equipment for generating a speech recognition model
  • Method, device and electronic equipment for generating a speech recognition model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0057] At present, when the end-to-end framework based on the codec attention mechanism is used for speech recognition, there are still the following defects:

[0058] On the one hand, the encoding and decoding functions in the current speech recognition neural network model are all realized based on the recurrent neural network structure, and the recurrent neural network has problems such as large consumption of computing resources and difficulty in parallel computing;

[0059] On the other hand, when the current speech recognition neural network model is training the model, the labeled text data corresponding to the input speech frame can ensure that the output at the previous moment must be correct, so the model training process does not consider When the output at the previous moment is wrong, how to train the model can still get the correct output result. As a result, when using the trained model for speech recognition, there will be an output error at the previous moment,...

Embodiment 2

[0101] Based on the same inventive concept, the embodiment of the present disclosure also provides a device for generating a speech recognition model, since the device is the device in the method in the embodiment of the present disclosure, and the problem-solving principle of the device is similar to the method, Therefore, the implementation of the device can refer to the implementation of the method, and the repetition will not be repeated.

[0102] Such as Figure 4 As shown, the speech recognition model includes an encoder and a decoder, and the device includes: an acquisition sample unit 400, an encoder training unit 401, and a decoder training unit 402, wherein:

[0103] The obtaining sample unit 400 is configured to perform obtaining training samples, each training sample includes a speech frame sequence and a corresponding labeled text sequence;

[0104] The encoder training unit 401 is configured to use the speech frame sequence as the input feature of the encoder, a...

Embodiment 3

[0116] Based on the same inventive concept, an embodiment of the present disclosure also provides an electronic device, since the electronic device is the electronic device in the method in the embodiment of the present disclosure, and the problem-solving principle of the electronic device is similar to the method, so For the implementation of the electronic device, reference may be made to the implementation of the method, and repeated descriptions will not be repeated.

[0117] Such as Figure 5 As shown, the electronic equipment includes:

[0118] Processor 500;

[0119] A memory 501 for storing instructions executable by the processor 500;

[0120] Wherein, the processor 500 is configured to execute the instructions to implement the following steps:

[0121] Obtain training samples, each training sample includes a speech frame sequence and a corresponding labeled text sequence;

[0122] Using the speech frame sequence as the input feature of the encoder, using the spee...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The disclosure relates to a method, device and electronic equipment for generating a speech recognition model, which are used to improve the accuracy and recognition effect of model recognition. The method includes: obtaining training samples, each training sample including a sequence of speech frames and a corresponding sequence of marked text; using the sequence of speech frames as an input feature of the encoder, and using the speech coded frame of the sequence of speech frames as the The output feature of the encoder, the encoder is trained; the speech coded frame is used as the input feature of the decoder, and the tagged text sequence corresponding to the speech frame sequence is used as the output feature to train the decoder, The current predicted text sequence is obtained, the speech coded frame is used as the input feature of the decoder, and the sequence obtained by combining the marked text sequence corresponding to the speech frame sequence and the current predicted text sequence after sampling according to a preset probability is used as The output features are used to train the decoder again.

Description

technical field [0001] The present disclosure relates to the technical field of speech recognition, in particular to a method, device and electronic equipment for generating a speech recognition model. Background technique [0002] The current mainstream speech recognition framework is an end-to-end framework based on the codec attention mechanism, such as the speech recognition neural network (Listen Attend and Spell, LAS) model, which includes three functions of encoding, decoding, and attention mechanism. It is used to model the feature frame of the speech, obtain the high-level information representation of the acoustics, and decode it to model the language information. Given the output at the previous moment, combined with the acoustic representation to predict the output at the current moment, the attention mechanism is used in the language Create a connection with acoustics, and extract content relevant to the current language from the acoustic representation. This m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G10L15/06G10L15/02G10L19/005G10L19/04G10L25/24
CPCG10L15/063G10L15/02G10L19/04G10L19/005G10L25/24G10L15/16G10L2015/027
Inventor 赵媛媛李杰王晓瑞李岩
Owner BEIJING DAJIA INTERNET INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products