Intelligent agent adaptive decision generation method and system based on deep reinforcement learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and generation system, which is applied in the field of agent adaptive decision generation, can solve the problems of poor stability in the training process, difficult to determine the adaptive behavior, and low robustness, so as to improve the training success rate, maintain stability, The effect of strong exploration ability and robustness

Active Publication Date: 2021-10-08

SHANDONG UNIV

View PDF5 Cites 6 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0004] Since the actual environment of the agent is often dynamic and uncertain, it is difficult to predict the environmental change in advance, so it is difficult to determine the adaptive behavior for the environmental change; and the existing agent learning and training process , the stability of the training process is poor, and the robustness is not high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0043] This embodiment provides a method for generating an agent's adaptive decision based on deep reinforcement learning, including: obtaining historical and current environment state information, environment reward information, and decision-making action information of the agent; obtaining the environment state at the next moment information; store all the information obtained as experience in the playback buffer; train the deep reinforcement learning model of the agent; use the ordinary gradient descent optimizer to find the optimal solution of the model during the training process of the deep reinforcement learning model of the agent; The human-machine anti-interception task is used as a carrier to verify the deep reinforcement learning model of the agent.

[0044] The Soft Actor-Critic (SAC) algorithm used in this embodiment is developed based on the DDPG algorithm. It is a brand-new deep reinforcement learning algorithm that solves the high sample complexity and fragility...

Embodiment 2

[0090] This embodiment provides a method for generating an adaptive decision for an agent based on deep reinforcement learning. The difference from Embodiment 1 is that in this embodiment, the experience replay in the agent's deep reinforcement learning algorithm adopts a priority experience replay strategy.

[0091]When performing parameter updates, the Off-Policy method adopted by SAC can reuse past experience and consistently sample data from past experience, that is, experience replay (Experience Replay); the experience replay mechanism enables online reinforcement learning agents to remember and reuse Past experiences. In previous research work, experience replays were uniformly sampled from the replay buffer, however, this approach simply replays experiences with the same probability as they were originally experienced, regardless of their importance; this In the embodiment, it is considered to combine SAC with a priority experience replay (Priority Experience Replay, PER...

Embodiment 3

[0124] This embodiment provides an agent adaptive decision generation method based on deep reinforcement learning. The difference from Embodiment 1 and Embodiment 2 is that in this embodiment, the experience playback in the agent's deep reinforcement learning algorithm adopts emphasizing Recent experience playback strategy.

[0125] In Example 2, by replaying important experience more frequently, the sampling efficiency of the experience replay mechanism in the SAC algorithm is further improved, thereby speeding up the learning efficiency and convergence speed, and using the UAV anti-interception task has priority for the proposed The SAC algorithm of experience replay is tested to prove that its algorithm is effective. This embodiment adopts another method for improving the experience replay mechanism—emphasizing the recent experience replay (Emphasizing Recent Experience, ERE) strategy, and proposes the SAC+ERE algorithm.

[0126] The Emphasis on Recent Experience (ERE) repla...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention provides an intelligent agent adaptive decision generation method and system based on deep reinforcement learning, an agent adaptive decision problem is studied based on a deep reinforcement learning SoftActor-Cr-it ic (SAC) algorithm, the SAC algorithm is improved for problems occurring in a training process, and SAC + PER, SAC + ERE and SAC + PER + ERE algorithms are provided. The intelligent agent self-adaptive decision-making problem is solved by using the powerful perception ability of deep learning and the efficient decision-making ability of reinforcement learning, and an intelligent agent is trained through a deep reinforcement learning algorithm, so that the intelligent agent summarizes experience in the process of interacting with the environment, and thus the understanding of the intelligent agent on specific behavior application is formed; meanwhile, an anti-interception task of the unmanned aerial vehicle in a simulation environment is taken as a carrier, and the effectiveness of the algorithm is verified.

Description

technical field [0001] The disclosure belongs to the technical field of intelligent optimization, and in particular relates to a method and system for generating adaptive decision-making of an agent based on deep reinforcement learning. Background technique [0002] The statements in this section merely provide background information related to the present disclosure and do not necessarily constitute prior art. [0003] Reinforcement learning is particularly important in the field of machine learning. It mainly studies how to maximize the expected benefits according to real-time changes in the environment; for general reinforcement learning algorithms, the goal of agent learning is to learn a strategy that maximizes the cumulative return expectation; Among them, the agent is the main body that performs reinforcement learning. [0004] Since the actual environment of the agent is often dynamic and uncertain, it is difficult to predict the environmental change in advance, so ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N20/00G06N3/04G06N3/08

CPCG06N20/00G06N3/08G06N3/045Y02T10/40

Inventor 宋勇程艳庞豹袁宪锋许庆阳巩志

Owner SHANDONG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Intelligent agent adaptive decision generation method and system based on deep reinforcement learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

Embodiment 3

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology