Depth reinforcement learning strategy optimization defense method and device based on imitation learning

A reinforcement learning and in-depth technology, applied in neural learning methods, biological neural network models, data processing applications, etc., can solve problems such as automatic decision-making of attacks, inaccurate decision-making results, and leaking loopholes, and achieve the effect of improving robustness.

Pending Publication Date: 2021-06-01
ZHEJIANG UNIV OF TECH
View PDF0 Cites 9 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] In order to solve the problem that existing reinforcement learning is attacked in the field of safety decision-making (such as automatic driving scenarios) or automatic decision-making leaks, which in turn leads to inacc

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Depth reinforcement learning strategy optimization defense method and device based on imitation learning
  • Depth reinforcement learning strategy optimization defense method and device based on imitation learning
  • Depth reinforcement learning strategy optimization defense method and device based on imitation learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0019] In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, and do not limit the protection scope of the present invention.

[0020] Based on the defense mechanism of robust learning, the embodiment provides a deep reinforcement learning policy optimization defense method based on imitation learning, which is mainly applied in automatic driving scenarios. The technical concept is: in the deep reinforcement learning training process of simulating the automatic driving of the car, the attack method based on strategy poisoning will make the learner learn a wrong strategy, so as to choose a bad action, so that the learner learns wrongly. Based on this situation, this method uses the imitation learn...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a deep reinforcement learning strategy optimization defense method and device based on imitation learning, and the method comprises the steps: building an agent automatic driving simulation environment of deep reinforcement learning, constructing a target agent based on a deep Q network in reinforcement learning, and carrying out the reinforcement learning of the target agent to optimize the parameters of the deep Q network; utilizing the parameter-optimized deep Q network to generate a state action pair sequence of the target agent at T moments as expert data, wherein an action value in a state action pair corresponds to an action with a minimum Q value; constructing an adversarial agent based on the generative adversarial network, and performing imitation learning on the adversarial agent, that is, taking the state in the expert data as the input of the generative adversarial network, and taking the expert data as a label to supervise and optimize the parameters of the generative adversarial network; and performing adversarial training on the target agent based on the state generated by the adversarial agent, and then optimizing parameters of the deep Q network to achieve deep reinforcement learning strategy optimization defense.

Description

technical field [0001] The invention belongs to the field of defense oriented to deep reinforcement learning, and in particular relates to a defense method and device based on imitation learning-based deep reinforcement learning strategy optimization. Background technique [0002] Deep reinforcement learning is one of the directions of artificial intelligence that has attracted much attention in recent years. With the rapid development and application of reinforcement learning, reinforcement learning has been widely used in robot control, game gaming, computer vision, unmanned driving and other fields. In order to ensure the safe application of deep reinforcement learning in safety-critical fields, the key is to analyze and discover loopholes in deep reinforcement learning algorithms and models to prevent people with ulterior motives from using these loopholes to conduct illegal profit-making activities. Different from the single-step prediction task of traditional machine l...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/04G06N3/08G06Q10/04
CPCG06N3/084G06Q10/04G06N3/045
Inventor 陈晋音章燕王雪柯胡书隆
Owner ZHEJIANG UNIV OF TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products