A Model Distillation Method Combined with Dynamic Vocabulary Augmentation

A technology of vocabulary enhancement and distillation, which is applied in computing models, machine learning, instruments, etc., can solve the problems of model inference relying on high-configuration equipment, model accuracy decline, and model size being too large, so as to improve semantic understanding and inference The effect of fast speed and low resource consumption

Active Publication Date: 2021-06-18
达而观数据(成都)有限公司
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This makes these models that have repeatedly broken records in academia face great problems in industrial implementation: the model size is too large and the parameters are too many, which leads to the model training process is too resource-intensive, and the model inference is also too dependent on high configuration. device, inference is too slow, etc.
For example, Huawei released TinyBert[1] in October this year. By distilling the Bert model, the model was reduced by 7.5 times and the inference speed was increased by 9.4 times. However, the accuracy of the model dropped by 3.2%. In the process, some semantic information is lost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Model Distillation Method Combined with Dynamic Vocabulary Augmentation

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further described below in conjunction with the accompanying drawings.

[0020] In order to make the technical solution given in this embodiment clear, the technical terms mentioned in this embodiment are explained below:

[0021] Encode: Indicates encoding.

[0022] Token: Indicates a token.

[0023] CRF: stands for Conditional Random Field.

[0024] GPU: Indicates the graphics card.

[0025] This embodiment provides a model distillation method combined with dynamic vocabulary enhancement. This method adopts model distillation and adds dictionary information in the fine-tuning process to reduce the size of the student model and improve the accuracy of the student model. The overall workflow is as follows figure 1 As shown, the specific steps are as follows:

[0026] First of all, fine-tuning the ALbert language model is different from the conventional fine-tuning logic. In the process of fine-tuning the ALbert language model, the fine-...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of natural language processing in the field of artificial intelligence, and discloses a model distillation method combined with dynamic vocabulary enhancement, including: on the basis of the ALbert language model, the language model is adjusted by fine-tuning technology combined with dynamic vocabulary enhancement technology , get the fine-tuned language model, and use it as the teacher model; when fine-tuning the language model, it is different from the conventional fine-tuning logic. In the fine-tuning process, the features of the dictionary information are first combined with the output features of the language model, and then Then fine-tune; after fine-tuning, the teacher model is distilled, and the obtained model prediction results are used as the training basis for the student model. The model distillation method provided by the present invention introduces dictionary information as key information, so that the model can still capture dictionary information as features in the case of greatly reducing the size, so as to greatly reduce the size of the model without sacrificing the accuracy of extraction. The purpose of inferring speed.

Description

technical field [0001] The invention relates to the technical field of natural language processing in the field of artificial intelligence, in particular to a model distillation method combined with dynamic vocabulary enhancement. Background technique [0002] Text key information extraction is the most common task in the field of natural language processing. In recent years, since the emergence of Bert, models based on the Transformer mechanism have emerged in an endless stream. From Bert to RoBERTa, to XLNet, GPT-3 and other models, the accuracy of key information extraction tasks has been continuously refreshed. However, when NLP tasks are actually implemented, enterprises often use the technical architecture of high-concurrency model deployment considering factors such as cost and efficiency, and large-scale models in a multi-copy system mean that a large amount of GPU resources are occupied. What enterprises are often pursuing is not the highest accuracy rate, but the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F40/284G06F40/242G06N20/00
CPCG06N20/00G06F40/242G06F40/284
Inventor 顾嘉晟李瀚清岳小龙高翔纪达麒陈运文
Owner 达而观数据(成都)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products