Recurrent neural network language model training method, device, equipment and medium

A neural network and language model technology, applied in the field of artificial intelligence, can solve problems such as hindering applications

Active Publication Date: 2019-07-23
MOBVOI INFORMATION TECH CO LTD
View PDF3 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] However, in order to pursue better language expression ability, large RNNLM models are often required, and it is precisely because of the large storage capacity and amazing computing costs of large RNNLMs that hinder their application in real-time application scenarios

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Recurrent neural network language model training method, device, equipment and medium
  • Recurrent neural network language model training method, device, equipment and medium
  • Recurrent neural network language model training method, device, equipment and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0026] figure 1 It is a flow chart of a recursive neural network language model training method provided in Embodiment 1 of the present invention. This embodiment is applicable to the training situation of a recurrent neural network language model used for language text recognition. The method can be composed of recursive Neural network language model training device to perform, specifically includes the following steps:

[0027] S110. Input the language text in the corpus into the trained high-rank recurrent neural network language model RNNLM and the lightweight RNNLM to be trained respectively.

[0028] In this embodiment, the corpus includes Penn Treebank (PTB) corpus and / or Wall Street Journal (WSJ) corpus. Among them, the PTB corpus contains a total of 24 parts, the vocabulary size is limited to 10000, and the label Indicates out-of-set words. Select part or all of the predictions in the PTB corpus as the training set, and input the language text in the training set i...

Embodiment 2

[0042] In the process of model training, it is found that the training process of the student model still has the following two defects: First, in the language model, each training data label vector represents a degenerated data distribution, which gives the corresponding language text Likelihood on a category. Compared to the possibility distribution obtained by the teacher model in all training data, that is, the probability that the corresponding language text falls on all labels, the degenerate data distribution has more noise and localization. Second, different from the previous experimental results of knowledge distillation in acoustic modeling and image recognition, in this embodiment, it is found in the experiment of language text recognition that when the cross-entropy loss and KL divergence have fixed weights, by minimizing The weighted sum of the cross-entropy loss and the KL divergence yields a student model that is inferior to that obtained by minimizing the KL ...

Embodiment 3

[0098] image 3 It is a schematic structural diagram of a recurrent neural network language model training device provided in Embodiment 3 of the present invention. Such as image 3 As shown, an input module 31 and a minimization module 32 are included.

[0099] The input module 31 is used to input the language text in the corpus into the high-rank recursive neural network language model RNNLM and the lightweight RNNLM to be trained respectively;

[0100] Minimize module 32, be used for iterating the parameter in lightweight RNNLM, minimize the weighted sum of cross entropy loss and Kullback-Leibler divergence, to complete the training to lightweight RNNLM;

[0101] Among them, the cross-entropy loss is the cross-entropy loss of the output vector of the lightweight RNNLM relative to the training data label vector of the language text, and the Kullback-Leibler divergence is the Kullback of the output vector of the lightweight RNNLM relative to the output vector of t...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention discloses a recursive neural network language model training method, device, equipment and medium. Among them, the method includes: inputting the language text in the corpus into the trained high-rank recursive neural network language model RNNLM and the lightweight RNNLM to be trained; iterating the parameters in the lightweight RNNLM to minimize the cross-entropy loss and the weighted sum of the Kullback‑Leibler divergence to complete the training of the lightweight RNNLM; where the cross-entropy loss is the cross-entropy loss of the output vector of the lightweight RNNLM relative to the training data label vector, and the Kullback‑Leibler divergence is The Kullback‑Leibler divergence of the output vector of a lightweight RNNLM with respect to the output vector of a high-rank RNNLM. The method provided in this embodiment can effectively reduce the scale of RNNLM.

Description

technical field [0001] The embodiments of the present invention relate to the field of artificial intelligence, and in particular, to a recursive neural network language model training method, device, equipment and medium. Background technique [0002] Recurrent Neural Network (RNN) has large storage capacity and strong computing power, which makes it have great advantages over traditional language modeling methods, and is now widely used in language modeling. [0003] The Recurrent Neural Network Model (RNNLM) is a model proposed by Mikolov in 2010. By using the Recurrent Neural Network (RNN) to train the language model, a better expression effect can be obtained. RNNLM expresses each word in a continuous, low-dimensional space, and has the ability to represent historical information of various lengths through a recursive vector. [0004] However, in order to pursue better language expressiveness, large RNNLM models are often required, and it is precisely because of the la...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06N3/04
CPCG06N3/045
Inventor 施阳阳黄美玉雷欣
Owner MOBVOI INFORMATION TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products