Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Training method and device for performing operation based on language model

A language model and training method technology, applied in the field of artificial intelligence, can solve problems such as large amounts of training data, huge training data sets, and inappropriate language model training, and achieve the effect of improving accuracy

Pending Publication Date: 2021-05-07
出门问问(武汉)信息科技有限公司
View PDF0 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

Under this training method, take GPT-2 as an example. Although the arithmetic model generated after model training can calculate some arithmetic operations, it requires a large amount of training data, such as two-digit plus two-digit arithmetic operations, GPT-2 A training set of more than 3200 calculation formulas is required to barely achieve 10% accuracy, and for more complex three-digit addition calculations, a larger training data set is required
It can be seen that the traditional training method is not suitable for the training of smaller language models.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Training method and device for performing operation based on language model
  • Training method and device for performing operation based on language model
  • Training method and device for performing operation based on language model

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0022] Exemplary embodiments of the present invention are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present invention to facilitate understanding, and they should be regarded as exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

[0023] Such as figure 1 As shown, a schematic flowchart of a training method for performing arithmetic operations based on a language model according to an embodiment of the present invention. A training method for performing arithmetic operations based on a language model, the method at least including the following operation process: S101, adding a space character befor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a training method and device for performing arithmetic operation based on a language model, and the method comprises the steps: adding a space character in front of each character in an equation text, and generating a quasi equation text; performing word segmentation processing on the quasi equation text to obtain a list corresponding to the quasi equation text, the list comprising a plurality of words; taking a list corresponding to the quasi equation text as a training sample to obtain training sample data; and performing operation training on the training sample data by using a language model to generate an arithmetic model. Therefore, the equation text is preprocessed by adding the space characters and performing word segmentation processing, so that each character in the equation text serves as an independent word to be input into the language model, and the language model can learn each character of the equation text and the digit information of each character; therefore, after the language model is trained by using a small number of training samples, the calculation accuracy of the trained arithmetic model can be effectively improved.

Description

technical field [0001] The invention relates to the technical field of artificial intelligence, in particular to a training method and device for performing calculations based on a language model. Background technique [0002] In the latest research, because the giant language model GPT-3 can perform arithmetic operations directly without training through small-sample learning, it performs well in arithmetic operations tasks. But for smaller language models, such as Bert, GPT-2, etc., the accuracy rate is almost 0 when performing arithmetic operations after learning with small samples, and basically has no practical value. [0003] At present, for smaller language models, most of them use the "pre-training + fine-tuning" method for model training. Under this training method, take GPT-2 as an example. Although the arithmetic model generated after model training can calculate some arithmetic operations, it requires a large amount of training data, such as two-digit plus two-d...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F40/126G06F40/289G06F40/216
CPCG06F40/126G06F40/289G06F40/216Y02D10/00
Inventor 张旭雷欣李志飞
Owner 出门问问(武汉)信息科技有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products