A Model Distillation Method Combined with Dynamic Vocabulary Augmentation

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of vocabulary enhancement and distillation, which is applied in computing models, machine learning, instruments, etc., can solve the problems of model inference relying on high-configuration equipment, model accuracy decline, and model size being too large, so as to improve semantic understanding and inference The effect of fast speed and low resource consumption

Active Publication Date: 2021-06-18

达而观数据(成都)有限公司

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This makes these models that have repeatedly broken records in academia face great problems in industrial implementation: the model size is too large and the parameters are too many, which leads to the model training process is too resource-intensive, and the model inference is also too dependent on high configuration. device, inference is too slow, etc.

For example, Huawei released TinyBert[1] in October this year. By distilling the Bert model, the model was reduced by 7.5 times and the inference speed was increased by 9.4 times. However, the accuracy of the model dropped by 3.2%. In the process, some semantic information is lost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0019] The present invention will be further described below in conjunction with the accompanying drawings.

[0020] In order to make the technical solution given in this embodiment clear, the technical terms mentioned in this embodiment are explained below:

[0021] Encode: Indicates encoding.

[0022] Token: Indicates a token.

[0023] CRF: stands for Conditional Random Field.

[0024] GPU: Indicates the graphics card.

[0025] This embodiment provides a model distillation method combined with dynamic vocabulary enhancement. This method adopts model distillation and adds dictionary information in the fine-tuning process to reduce the size of the student model and improve the accuracy of the student model. The overall workflow is as follows figure 1 As shown, the specific steps are as follows:

[0026] First of all, fine-tuning the ALbert language model is different from the conventional fine-tuning logic. In the process of fine-tuning the ALbert language model, the fine-...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of natural language processing in the field of artificial intelligence, and discloses a model distillation method combined with dynamic vocabulary enhancement, including: on the basis of the ALbert language model, the language model is adjusted by fine-tuning technology combined with dynamic vocabulary enhancement technology , get the fine-tuned language model, and use it as the teacher model; when fine-tuning the language model, it is different from the conventional fine-tuning logic. In the fine-tuning process, the features of the dictionary information are first combined with the output features of the language model, and then Then fine-tune; after fine-tuning, the teacher model is distilled, and the obtained model prediction results are used as the training basis for the student model. The model distillation method provided by the present invention introduces dictionary information as key information, so that the model can still capture dictionary information as features in the case of greatly reducing the size, so as to greatly reduce the size of the model without sacrificing the accuracy of extraction. The purpose of inferring speed.

Description

technical field [0001] The invention relates to the technical field of natural language processing in the field of artificial intelligence, in particular to a model distillation method combined with dynamic vocabulary enhancement. Background technique [0002] Text key information extraction is the most common task in the field of natural language processing. In recent years, since the emergence of Bert, models based on the Transformer mechanism have emerged in an endless stream. From Bert to RoBERTa, to XLNet, GPT-3 and other models, the accuracy of key information extraction tasks has been continuously refreshed. However, when NLP tasks are actually implemented, enterprises often use the technical architecture of high-concurrency model deployment considering factors such as cost and efficiency, and large-scale models in a multi-copy system mean that a large amount of GPU resources are occupied. What enterprises are often pursuing is not the highest accuracy rate, but the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Patents(China)

IPC IPC(8): G06F40/284G06F40/242G06N20/00

CPCG06N20/00G06F40/242G06F40/284

Inventor 顾嘉晟李瀚清岳小龙高翔纪达麒陈运文

Owner 达而观数据(成都)有限公司

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A Model Distillation Method Combined with Dynamic Vocabulary Augmentation

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology