A dynamic vocabulary enhancement combined model distillation method

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of vocabulary enhancement and distillation method, applied in computational models, instruments, electrical digital data processing and other directions, can solve the problems of model inference relying on high-configuration equipment, model accuracy decline, loss of semantic information, etc., to improve semantic understanding, The effect of fast inference and reduced model size

Active Publication Date: 2021-04-23

达而观数据(成都)有限公司

View PDF8 Cites 1 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

This makes these models that have repeatedly broken records in academia face great problems in industrial implementation: the model size is too large and the parameters are too many, which leads to the model training process is too resource-intensive, and the model inference is also too dependent on high configuration. device, inference is too slow, etc.

For example, Huawei released TinyBert[1] in October this year. By distilling the Bert model, the model was reduced by 7.5 times and the inference speed was increased by 9.4 times. However, the accuracy of the model dropped by 3.2%. In the process, some semantic information is lost

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0019] The present invention will be further described below in conjunction with the accompanying drawings.

[0020] In order to make the technical solution given in this embodiment clear, the technical terms mentioned in this embodiment are explained below:

[0021] Encode: Indicates encoding.

[0022] Token: Indicates a token.

[0023] CRF: stands for Conditional Random Field.

[0024] GPU: Indicates the graphics card.

[0025] This embodiment provides a model distillation method combined with dynamic vocabulary enhancement. This method adopts model distillation and adds dictionary information in the fine-tuning process to reduce the size of the student model and improve the accuracy of the student model. The overall workflow is as follows figure 1 As shown, the specific steps are as follows:

[0026] First of all, fine-tuning the ALbert language model is different from the conventional fine-tuning logic. In the process of fine-tuning the ALbert language model, the fine-...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to the technical field of natural language processing in the field of artificial intelligence, and discloses a dynamic vocabulary enhancement combined model distillation method, which comprises the following steps: on the basis of an ALBert language model, adjusting the language model by combining a fine adjustment technology with a dynamic vocabulary enhancement technology to obtain a finely adjusted language model, and taking the finely adjusted language model as a teacher model; different from the conventional fine adjustment logic, when the language model is finely adjusted, in the fine adjustment process, combining the characteristics of the dictionary information with the output characteristics of the language model, and then performing fine adjustment; and after fine adjustment is finished, distilling the teacher model, and taking an obtained model prediction result as a training basis of the student model. According to the model distillation method provided by the invention, the dictionary information is introduced as the key information, so that the model can still capture the dictionary information as a feature under the condition of greatly reducing the size, thereby achieving the purposes of greatly reducing the size of the model and accelerating the inference speed under the condition of not sacrificing the extraction accuracy.

Description

technical field [0001] The invention relates to the technical field of natural language processing in the field of artificial intelligence, in particular to a model distillation method combined with dynamic vocabulary enhancement. Background technique [0002] Text key information extraction is the most common task in the field of natural language processing. In recent years, since the emergence of Bert, models based on the Transformer mechanism have emerged in an endless stream. From Bert to RoBERTa, to XLNet, GPT-3 and other models, the accuracy of key information extraction tasks has been continuously refreshed. However, when NLP tasks are actually implemented, enterprises often use the technical architecture of high-concurrency model deployment considering factors such as cost and efficiency, and large-scale models in a multi-copy system mean that a large amount of GPU resources are occupied. What enterprises are often pursuing is not the highest accuracy rate, but the ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F40/284G06F40/242G06N20/00

CPCG06N20/00G06F40/242G06F40/284

Inventor 顾嘉晟李瀚清岳小龙高翔纪达麒陈运文

Owner 达而观数据(成都)有限公司

Features

Generate Ideas
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

A dynamic vocabulary enhancement combined model distillation method

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology