Data enhancement method based on adversarial training in text classification scene

A text classification and scene technology, applied in unstructured text data retrieval, text database clustering/classification, special data processing applications, etc., can solve problems such as model collapse, natural language discreteness, and neglect, and achieve high accuracy, Improve the effect of diversity

Pending Publication Date: 2022-05-31
TIANJIN UNIV
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] Most of the existing research on data enhancement methods is oriented to the task of Computer Vision (CV) and has been able to play a very significant role in large-scale neural network models, but in Natural Language Processing (Natural Language Processing, referred to as NLP) task, but has been ignored by researchers
On the one hand, the key to data enhancement lies in the preserved transformation of labels, but natural language is discrete and cannot be applied to practical tasks through simple operations like image data
On the other hand, the scale of existing neural network models is generally large and has the characteristics of over-parameterization. For such models, how to fine-tune them has become the focus of research, especially when a specific In the task, when the data set contains too little label data, some slight changes may cause the model to collapse

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Data enhancement method based on adversarial training in text classification scene
  • Data enhancement method based on adversarial training in text classification scene
  • Data enhancement method based on adversarial training in text classification scene

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The specific implementation manners of the present invention will be described in detail below in conjunction with the accompanying drawings.

[0030] Such as figure 1 As shown, the data enhancement method based on adversarial training in the text classification scenario of the present invention, the specific implementation process is as follows:

[0031] Step 1: back translation

[0032] Suppose the training set can be used with parameters D={x i ,y i} 1...N means, among them, x i represents the original sample, y i represents the original sample x i The label of the original sample x in the training set i Perform back-translation operation to generate a sample sequence set, use D'={x' i ,y′ i} 1...N Indicates that x′ i represents the original sample x i The paraphrase sample of y i Denotes paraphrase sample x′ i Tag of.

[0033] Suppose the back translation can use the formula x′ i =BackTrans(x i ) means, among them, the parameter x′ i is the paramet...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a data enhancement method based on adversarial training in a text classification scene, which comprises the following steps of: translating an original sample xi in a training set back to generate a sample sequence set D '= {x'i, y'i} 1... N; performing a word embedding process on the paraphrasing sample x'i to obtain p theta (x'i); carrying out adversarial training on the paraphrase sample x'i to obtain an adversarial sample original sample xi, carrying out a word embedding process to obtain p theta (xi), and carrying out a word embedding process of adding disturbance on the adversarial sample to obtain comparison loss of the original sample and the adversarial sample through calculation; and if the comparison loss obtained by calculation meets a preset threshold range, adding the generated adversarial sample into the original data set, and sending the adversarial sample and the original data set to a classification model for simulation to obtain a simulation classification accuracy result. According to the method, adversarial training is combined with data enhancement technologies such as back translation, random noise injection, cross enhancement and the like, so that enhanced samples with the highest quality and the highest diversity are obtained.

Description

technical field [0001] The invention belongs to the field of natural language processing, and more specifically relates to a data enhancement method based on adversarial training in a text classification scenario. Background technique [0002] There are generally two forms of data enhancement, the first is data expansion, and the second is feature enhancement. Either way, the essence is to generate new samples on the basis of original samples with the help of auxiliary information or auxiliary data. Increase sample diversity. Data augmentation is the addition of newly generated unlabeled or labeled samples to the original dataset. Feature enhancement is to amplify its data features on the basis of the original sample, so that the classification model can better identify it. [0003] Most of the existing research on data enhancement methods is oriented to the task of Computer Vision (CV) and has been able to play a very significant role in large-scale neural network models,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/35G06F40/58G06N20/00
CPCG06F16/35G06F40/58G06N20/00
Inventor 李剑冯雪松于永新
Owner TIANJIN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products