Machine learning training data poisoning attack defense method

A technology of training data and machine learning, which is applied in the field of information security and can solve problems such as restrictions on attack methods

Active Publication Date: 2020-11-10
HUAZHONG UNIV OF SCI & TECH
View PDF4 Cites 23 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] The present invention provides a defense method for machine learning training data subjected to poisoning attacks, which is used to solve the limitation problem of only specific attack methods in the existing defense methods for training data used in intelligent security detection model training against poisoning attacks

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Machine learning training data poisoning attack defense method
  • Machine learning training data poisoning attack defense method
  • Machine learning training data poisoning attack defense method

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0036] A defense method against poisoning attacks on machine learning training data, comprising:

[0037]Obtain the predicted value distribution of the clean training data set corresponding to the training data set to be identified; input each training data to be identified into the trained prediction model to obtain the predicted value; determine whether the training data is cast based on the predicted value and the predicted value distribution Toxic data to achieve attack defense;

[0038] Among them, the prediction model is obtained by the following training methods: data augmentation based on the same type of trusted training data as the training data to be identified to generate multiple synthetic data; using an enhanced data set composed of multiple synthetic data and trusted training data Train and obtain the prediction model; the distribution of the enhanced data set is the same as that of the clean training data set, and the prediction model is based on the distributi...

Embodiment 2

[0082] A computer-readable storage medium, the computer-readable storage medium includes a stored computer program, wherein, when the computer program is run by a processor, the device where the storage medium is located is controlled to perform the above-mentioned machine learning training A defense method against data poisoning attacks. The relevant technical solutions are the same as those in Embodiment 1, and will not be repeated here.

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention belongs to the field of information security, and particularly relates to a machine learning training data poisoning attack defense method, which is used for intelligent security and protection, training data are collected in security and protection video data, and the method comprises the following steps: obtaining prediction value distribution of a clean training data set corresponding to a training data set to be identified; inputting each piece of to-be-identified training data into the trained prediction model to obtain a prediction value; determining and identifying poisoning data based on the predicted value and the predicted value distribution so as to realize defense; generating synthetic data based on the trusted training data of the same type as the training data to be identified; training by adopting an enhanced data set consisting of the synthetic data and the trustworthy training data to obtain a prediction model; wherein the distribution of the enhanced data set is the same as that of the clean training data set, and the prediction model takes the distribution of prediction values output by the enhanced data set as the prediction value distribution. Effective protection can be provided without defining a machine learning algorithm or an attack type, and the problem of limitation on specific attacks in an existing defense method is solved.

Description

technical field [0001] The invention belongs to the field of information security, and more specifically relates to a method for defending machine learning training data from poisoning attacks. Background technique [0002] In recent years, with the development of machine learning, various systems based on machine learning, such as automatic driving systems, face detection systems, and voice recognition systems, have been widely used, especially intelligent security systems. However, various security issues faced by machine learning itself are gradually emerging. [0003] Machine learning refers to continuously learning, identifying features, and modeling through a large amount of training data, and finally an effective system model can be obtained. Recent studies have shown that machine learning is highly susceptible to data poisoning attacks. In this case, an attacker can disrupt the learning process by injecting a small number of malicious samples into the training data...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F21/56G06F21/57G06N20/00G06N3/08
CPCG06F21/56G06F21/57G06N3/08G06N20/00
Inventor 王琛陈健张旭鑫彭凯
Owner HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products