Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Language model compression method based on uncertainty estimation knowledge distillation

An uncertainty and language model technology, applied in the field of compression of pre-trained language models, can solve problems such as low network compression rate, low efficiency, and large computational burden, and achieve the goals of reducing the number of parameters, improving training efficiency, and improving reasoning performance Effect

Pending Publication Date: 2022-07-29
XIDIAN UNIV
View PDF0 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The scheme is verified on the Chinese named entity recognition NER task. Although the calculation cost of the compression method is not large, the accuracy of the compressed network model is reduced by 1-2 percentage points compared with the original network, and the network compression rate is low. , which reduces the efficiency of model operation and leads to a waste of a large amount of computing resources
[0006] The disadvantages of the above-mentioned existing network lightweight methods are: 1) lack of supervision on the intermediate reasoning process of the network, 2) insufficient utilization of the original network parameters, 3) lack of noise estimation in the knowledge distillation process
In turn, the calculation burden of the network compression process is too large, the efficiency of the compression process is too low, and the performance accuracy of the compressed network is low.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Language model compression method based on uncertainty estimation knowledge distillation
  • Language model compression method based on uncertainty estimation knowledge distillation
  • Language model compression method based on uncertainty estimation knowledge distillation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0031] The present invention will be described in further detail below with reference to the accompanying drawings and specific embodiments.

[0032] refer to figure 1 , a pre-trained language model compression method based on uncertainty estimation knowledge distillation, the implementation steps are as follows:

[0033] Step 1. Obtain training and testing datasets.

[0034] Obtain the data set in GLUE, the basic task of public natural language understanding. The data set contains various tasks of common natural language processing, which can better test the comprehensive performance of the language model.

[0035] This example is obtained from the following four types of data sets in this data set, and subsequent experimental test tasks are performed:

[0036] First, the language acceptability corpus CoLA is a single-sentence classification task, and its corpus comes from language theory books and journals, where each word sequence is marked as grammatical;

[0037] Secon...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a language model compression method for estimating knowledge distillation based on uncertainty, and mainly solves the problems of high training cost, low speed and noise interference in the knowledge distillation process in the existing network compression technology. According to the implementation scheme, the method comprises the steps of 1) performing half-and-half compression on an original language model to obtain a compressed neural network; 2) reasonably initializing parameters of the compressed neural network by using an original language model; 3) adding a parameter distillation loss function of a feedforward network structure, and designing an uncertainty estimation loss function and a cross entropy loss function of a natural language processing task; and 4) training the compressed neural network model by using the designed loss function. According to the method, the calculation amount in the network compression training process is reduced, the network compression rate is improved, the network reasoning speed is increased, the method can be widely applied to model deployment and model compression tasks, and a new model compression solution is provided for an application scene in shortage of hardware resources.

Description

technical field [0001] The invention belongs to the field of neural network compression, and in particular relates to a compression method for a pre-trained language model, which can be used for model deployment, model compression, and alleviation of model hardware burden. Background technique [0002] In recent years, the natural language processing research community has witnessed a revolution in pre-training and self-supervised models, with the research and application of large-scale pre-trained language models, first pre-trained on large-scale text data, and then on downstream tasks. Transfer learning, pre-training and fine-tuning have gradually become the basic paradigm for natural language processing solutions. The emergence of BERT has significantly improved the performance of many natural language processing tasks. However, pre-trained language models are often computationally expensive and memory intensive. These models typically have hundreds of millions of parame...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/082G06N3/048G06N3/045
Inventor 董伟生黄天瑜毋芳芳石光明
Owner XIDIAN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products