Adversarial text generation method and system for black box text classification model and medium

A text classification and model technology, applied in text database clustering/classification, unstructured text data retrieval, natural language data processing, etc., can solve the problem that the attacker cannot know the internal structure and parameters of the victim model, and achieve smoothness The effect of easy guarantee and high attack success rate

Pending Publication Date: 2022-01-04
CHINA PING AN LIFE INSURANCE CO LTD
View PDF0 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] However, the attacker cannot know the internal structure and parameters of the victim model, that is to say, what we want to attack is the black-box text classification model, so an attack method for the black-box text classification model is urgently needed

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Adversarial text generation method and system for black box text classification model and medium
  • Adversarial text generation method and system for black box text classification model and medium
  • Adversarial text generation method and system for black box text classification model and medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0046] Example embodiment of the present embodiment, a method of generating text against a black box for text classification model, such as figure 1 Shown, including S101, acquires the original black box corpus and classification label text classification model; S102, the original word corpus, to obtain the word corresponding to the sequence of the original corpus; S103 respectively acquire the predetermined word sequence number of each word synonyms ; S104 successively replaced by synonyms for each word in the sentence candidate position of the word to form a new sentence to form a new set of candidate text; S105 with the new set of candidate text sentence for each classification sequentially, as the black text box model input, respectively output corresponding to the minimum value of the probability of the selected original label corresponding to the K original corpus of sentences against a set of text, wherein, K is an integer. Described in detail below for each step.

[0047] ...

Embodiment 2

[0062] Example embodiment of the present embodiment, a method of generating text against a black box for text classification model, comprising: a first step, retrieve the original black box corpus and classification label text classification model; the second step, the original word corpus, to obtain the original corpus a corresponding word sequence; a third step of acquiring the sequence of words in each predetermined number of synonyms for each word; a fourth step, the importance of each word in a word sequence is calculated, the importance level of each press sorting words; a fifth step of sequentially replacing the candidate word to the sentence position of each word by a synonym new sentence is formed, to form a new set of candidate text; a sixth step, with the new candidate set for each sentence in the text Examples of the black box sequentially text classification model input, respectively output corresponding to the lowest probability value selected original label correspo...

Embodiment 3

[0070] Example text against a black box for text generation system of the present embodiment classification model, such as figure 2 As shown, including:

[0071] Obtaining module 301, configured to obtain the original black box corpus and classification label text classification model;

[0072] Segmentation module 302, for the original corpus word to obtain the word corresponding to the sequence of the original corpus;

[0073] Synonyms module 303, respectively, for each word in the word sequence acquired a preset number of synonyms;

[0074] Candidate module 304, for sequentially replacing the position of the candidate sentence with a synonym word each word to form a new sentence to form a new set of candidate text;

[0075] Against the text generation module 305, with the new candidate text for each of the set sequentially as input a sentence of the text classification model is a black box, respectively corresponding to the output, the original label selected original corpus cor...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to the technical field of adversarial text generation, in particular to an adversarial text generation method and a system for a black box text classification model and a medium. The method comprises the following steps: collecting an original corpus and a classification label of a black box text classification model; performing word segmentation on the original corpus to obtain a word sequence corresponding to the original corpus; respectively obtaining a preset number of synonyms of each word in the word sequence; sequentially replacing the positions of the words in the candidate sentences with synonyms of the words to form new sentences, and forming a new candidate text set; and taking each sentence in the new candidate text set as the input of the black box text classification model in sequence to obtain a corresponding output result, and screening out K sentences with the lowest probability value of the original tag corresponding to the original corpus to form a confrontation text set. The invention has better performance in the aspects of confrontation sample quality and effectiveness control and attack success rate, and the generated confrontation sample has smoothness and fluency.

Description

Technical field [0001] This application relates to the field against the text generation technology, more specifically, the present application relates to confrontation text generation method, system and media for the black box of text classification model. Background technique [0002] Depth study of security issues have gradually been recognized by academia, industry, and to pay attention to, but does not indicate importance can prevent or resist external attacks against a variety of deep learning models, especially the image field. Also we have the same problem and face recognition in natural language processing. A model for natural language sentence normally classified, rewrite another synonymous sentence, the model it is possible to identify the error. Another example sensitive information recognition model to identify sentences that contain sensitive information, such as abuse, pornography, and other related affairs. If a user sends a sentence containing sensitive informati...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F16/335G06F16/35G06F40/247G06F40/289
CPCG06F16/337G06F16/353G06F40/289G06F40/247
Inventor 陆凯
Owner CHINA PING AN LIFE INSURANCE CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products