Strategy selection method based on Actor-Critic framework in deep reinforcement learning

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A technology of reinforcement learning and program selection, applied in the field of reinforcement learning, can solve problems such as increasing the complexity of the training process, and achieve the effect of increasing the search ability

Active Publication Date: 2020-10-13

ZHEJIANG UNIV

View PDF12 Cites 12 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

But in this way, different low-level strategies are usually trained with different simple tasks first, which requires the definition of additional tasks, increasing the complexity of the complete training process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0026] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0027] In this embodiment, the specific plan for the LunarLander-v2 mission is described. The goal of LunarLander-v2 is to simulate and control a lunar lander to complete the landing mission, so that it can land in the specified location area at a speed close to 0, and the input state s is 8 A continuous variable represents the position, velocity, angle, angular velocity and ground contact state of the lander. The output action a is the value range of the set A={1,2,3,4}, and the numbers in A represent four kinds of action behaviors : 1 → do nothing, 2 → ignite the left steering engine, 3 → ignite the main engine, 4 → ignite the right steering engine. Such as figure 1 As shown, the overall technical solution can be realized through the following specific...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a strategy selection method based on Actor-Critic framework in deep reinforcement learning. According to the method, a plurality of strategies are kept in an Actor at the sametime; in the reinforcement learning training process, an action state value function in Critic is used for estimating the value expectation of a strategy, an advantage strategy in the strategy is selected or integrated in real time according to the expectation to be output or updated, the purpose is to improve the training speed, and an effective local strategy is generated in training. Therefore,the method has the technical effects of improving the sampling efficiency, increasing the parameter search space and the like for the strategy gradient-based reinforcement learning algorithm using the Actor-Critic framework.

Description

technical field [0001] The invention belongs to the technical field of reinforcement learning, and in particular relates to a strategy selection method based on an Actor-Critic framework in deep reinforcement learning. Background technique [0002] A reinforcement learning agent interacts with an environment by receiving observations that characterize the current state of the environment and, in response, performing actions from a predetermined set of actions; some reinforcement learning agents use neural networks to select The action performed. [0003] Actor-Critic is a commonly used framework in reinforcement learning, where Actor is responsible for outputting the current execution action, and Critic is responsible for estimating the value of the current action or the value of the current state; usually there will be an Actor and a Critic, according to the reinforcement learning algorithm used Simultaneously update the parameters of both to train an excellent reinforceme...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G05B13/04G06N3/04G06N3/08

CPCG05B13/042G06N3/084G06N3/045

Inventor 李红杨国青钱广一吕攀吴朝晖

Owner ZHEJIANG UNIV

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Strategy selection method based on Actor-Critic framework in deep reinforcement learning

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology