Intelligent agent adaptive decision generation method and system based on deep reinforcement learning

A technology of reinforcement learning and generation system, which is applied in the field of agent adaptive decision generation, can solve the problems of poor stability in the training process, difficult to determine the adaptive behavior, and low robustness, so as to improve the training success rate, maintain stability, The effect of strong exploration ability and robustness

Active Publication Date: 2021-10-08
SHANDONG UNIV
View PDF5 Cites 6 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] Since the actual environment of the agent is often dynamic and uncertain, it is difficult to predict the environmental change in advance, so it is difficult to determine the adaptive behavior for the environmental change; and the existing agent learning and training process , the stability of the training process is poor, and the robustness is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Intelligent agent adaptive decision generation method and system based on deep reinforcement learning
  • Intelligent agent adaptive decision generation method and system based on deep reinforcement learning
  • Intelligent agent adaptive decision generation method and system based on deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0043] This embodiment provides a method for generating an agent's adaptive decision based on deep reinforcement learning, including: obtaining historical and current environment state information, environment reward information, and decision-making action information of the agent; obtaining the environment state at the next moment information; store all the information obtained as experience in the playback buffer; train the deep reinforcement learning model of the agent; use the ordinary gradient descent optimizer to find the optimal solution of the model during the training process of the deep reinforcement learning model of the agent; The human-machine anti-interception task is used as a carrier to verify the deep reinforcement learning model of the agent.

[0044] The Soft Actor-Critic (SAC) algorithm used in this embodiment is developed based on the DDPG algorithm. It is a brand-new deep reinforcement learning algorithm that solves the high sample complexity and fragility...

Embodiment 2

[0090] This embodiment provides a method for generating an adaptive decision for an agent based on deep reinforcement learning. The difference from Embodiment 1 is that in this embodiment, the experience replay in the agent's deep reinforcement learning algorithm adopts a priority experience replay strategy.

[0091]When performing parameter updates, the Off-Policy method adopted by SAC can reuse past experience and consistently sample data from past experience, that is, experience replay (Experience Replay); the experience replay mechanism enables online reinforcement learning agents to remember and reuse Past experiences. In previous research work, experience replays were uniformly sampled from the replay buffer, however, this approach simply replays experiences with the same probability as they were originally experienced, regardless of their importance; this In the embodiment, it is considered to combine SAC with a priority experience replay (Priority Experience Replay, PER...

Embodiment 3

[0124] This embodiment provides an agent adaptive decision generation method based on deep reinforcement learning. The difference from Embodiment 1 and Embodiment 2 is that in this embodiment, the experience playback in the agent's deep reinforcement learning algorithm adopts emphasizing Recent experience playback strategy.

[0125] In Example 2, by replaying important experience more frequently, the sampling efficiency of the experience replay mechanism in the SAC algorithm is further improved, thereby speeding up the learning efficiency and convergence speed, and using the UAV anti-interception task has priority for the proposed The SAC algorithm of experience replay is tested to prove that its algorithm is effective. This embodiment adopts another method for improving the experience replay mechanism—emphasizing the recent experience replay (Emphasizing Recent Experience, ERE) strategy, and proposes the SAC+ERE algorithm.

[0126] The Emphasis on Recent Experience (ERE) repla...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides an intelligent agent adaptive decision generation method and system based on deep reinforcement learning, an agent adaptive decision problem is studied based on a deep reinforcement learning SoftActor-Cr-it ic (SAC) algorithm, the SAC algorithm is improved for problems occurring in a training process, and SAC + PER, SAC + ERE and SAC + PER + ERE algorithms are provided. The intelligent agent self-adaptive decision-making problem is solved by using the powerful perception ability of deep learning and the efficient decision-making ability of reinforcement learning, and an intelligent agent is trained through a deep reinforcement learning algorithm, so that the intelligent agent summarizes experience in the process of interacting with the environment, and thus the understanding of the intelligent agent on specific behavior application is formed; meanwhile, an anti-interception task of the unmanned aerial vehicle in a simulation environment is taken as a carrier, and the effectiveness of the algorithm is verified.

Description

technical field [0001] The disclosure belongs to the technical field of intelligent optimization, and in particular relates to a method and system for generating adaptive decision-making of an agent based on deep reinforcement learning. Background technique [0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. [0003] Reinforcement learning is particularly important in the field of machine learning. It mainly studies how to maximize the expected benefits according to real-time changes in the environment; for general reinforcement learning algorithms, the goal of agent learning is to learn a strategy that maximizes the cumulative return expectation; Among them, the agent is the main body that performs reinforcement learning. [0004] Since the actual environment of the agent is often dynamic and uncertain, it is difficult to predict the environmental change in advance, so ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00G06N3/04G06N3/08
CPCG06N20/00G06N3/08G06N3/045Y02T10/40
Inventor 宋勇程艳庞豹袁宪锋许庆阳巩志
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products