Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A dynamic vocabulary enhancement combined model distillation method

A technology of vocabulary enhancement and distillation method, applied in computational models, instruments, electrical digital data processing and other directions, can solve the problems of model inference relying on high-configuration equipment, model accuracy decline, loss of semantic information, etc., to improve semantic understanding, The effect of fast inference and reduced model size

Active Publication Date: 2021-04-23
达而观数据(成都)有限公司
View PDF8 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

This makes these models that have repeatedly broken records in academia face great problems in industrial implementation: the model size is too large and the parameters are too many, which leads to the model training process is too resource-intensive, and the model inference is also too dependent on high configuration. device, inference is too slow, etc.
For example, Huawei released TinyBert[1] in October this year. By distilling the Bert model, the model was reduced by 7.5 times and the inference speed was increased by 9.4 times. However, the accuracy of the model dropped by 3.2%. In the process, some semantic information is lost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A dynamic vocabulary enhancement combined model distillation method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] The present invention will be further described below in conjunction with the accompanying drawings.

[0020] In order to make the technical solution given in this embodiment clear, the technical terms mentioned in this embodiment are explained below:

[0021] Encode: Indicates encoding.

[0022] Token: Indicates a token.

[0023] CRF: stands for Conditional Random Field.

[0024] GPU: Indicates the graphics card.

[0025] This embodiment provides a model distillation method combined with dynamic vocabulary enhancement. This method adopts model distillation and adds dictionary information in the fine-tuning process to reduce the size of the student model and improve the accuracy of the student model. The overall workflow is as follows figure 1 As shown, the specific steps are as follows:

[0026] First of all, fine-tuning the ALbert language model is different from the conventional fine-tuning logic. In the process of fine-tuning the ALbert language model, the fine-...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to the technical field of natural language processing in the field of artificial intelligence, and discloses a dynamic vocabulary enhancement combined model distillation method, which comprises the following steps: on the basis of an ALBert language model, adjusting the language model by combining a fine adjustment technology with a dynamic vocabulary enhancement technology to obtain a finely adjusted language model, and taking the finely adjusted language model as a teacher model; different from the conventional fine adjustment logic, when the language model is finely adjusted, in the fine adjustment process, combining the characteristics of the dictionary information with the output characteristics of the language model, and then performing fine adjustment; and after fine adjustment is finished, distilling the teacher model, and taking an obtained model prediction result as a training basis of the student model. According to the model distillation method provided by the invention, the dictionary information is introduced as the key information, so that the model can still capture the dictionary information as a feature under the condition of greatly reducing the size, thereby achieving the purposes of greatly reducing the size of the model and accelerating the inference speed under the condition of not sacrificing the extraction accuracy.

Description

technical field [0001] The invention relates to the technical field of natural language processing in the field of artificial intelligence, in particular to a model distillation method combined with dynamic vocabulary enhancement. Background technique [0002] Text key information extraction is the most common task in the field of natural language processing. In recent years, since the emergence of Bert, models based on the Transformer mechanism have emerged in an endless stream. From Bert to RoBERTa, to XLNet, GPT-3 and other models, the accuracy of key information extraction tasks has been continuously refreshed. However, when NLP tasks are actually implemented, enterprises often use the technical architecture of high-concurrency model deployment considering factors such as cost and efficiency, and large-scale models in a multi-copy system mean that a large amount of GPU resources are occupied. What enterprises are often pursuing is not the highest accuracy rate, but the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F40/284G06F40/242G06N20/00
CPCG06N20/00G06F40/242G06F40/284
Inventor 顾嘉晟李瀚清岳小龙高翔纪达麒陈运文
Owner 达而观数据(成都)有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products