Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Strategy selection method based on Actor-Critic framework in deep reinforcement learning

A technology of reinforcement learning and program selection, applied in the field of reinforcement learning, can solve problems such as increasing the complexity of the training process, and achieve the effect of increasing the search ability

Active Publication Date: 2020-10-13
ZHEJIANG UNIV
View PDF12 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

But in this way, different low-level strategies are usually trained with different simple tasks first, which requires the definition of additional tasks, increasing the complexity of the complete training process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Strategy selection method based on Actor-Critic framework in deep reinforcement learning
  • Strategy selection method based on Actor-Critic framework in deep reinforcement learning
  • Strategy selection method based on Actor-Critic framework in deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] In order to describe the present invention more specifically, the technical solutions of the present invention will be described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0027] In this embodiment, the specific plan for the LunarLander-v2 mission is described. The goal of LunarLander-v2 is to simulate and control a lunar lander to complete the landing mission, so that it can land in the specified location area at a speed close to 0, and the input state s is 8 A continuous variable represents the position, velocity, angle, angular velocity and ground contact state of the lander. The output action a is the value range of the set A={1,2,3,4}, and the numbers in A represent four kinds of action behaviors : 1 → do nothing, 2 → ignite the left steering engine, 3 → ignite the main engine, 4 → ignite the right steering engine. Such as figure 1 As shown, the overall technical solution can be realized through the following specific...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a strategy selection method based on Actor-Critic framework in deep reinforcement learning. According to the method, a plurality of strategies are kept in an Actor at the sametime; in the reinforcement learning training process, an action state value function in Critic is used for estimating the value expectation of a strategy, an advantage strategy in the strategy is selected or integrated in real time according to the expectation to be output or updated, the purpose is to improve the training speed, and an effective local strategy is generated in training. Therefore,the method has the technical effects of improving the sampling efficiency, increasing the parameter search space and the like for the strategy gradient-based reinforcement learning algorithm using the Actor-Critic framework.

Description

technical field [0001] The invention belongs to the technical field of reinforcement learning, and in particular relates to a strategy selection method based on an Actor-Critic framework in deep reinforcement learning. Background technique [0002] A reinforcement learning agent interacts with an environment by receiving observations that characterize the current state of the environment and, in response, performing actions from a predetermined set of actions; some reinforcement learning agents use neural networks to select The action performed. [0003] Actor-Critic is a commonly used framework in reinforcement learning, where Actor is responsible for outputting the current execution action, and Critic is responsible for estimating the value of the current action or the value of the current state; usually there will be an Actor and a Critic, according to the reinforcement learning algorithm used Simultaneously update the parameters of both to train an excellent reinforceme...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G05B13/04G06N3/04G06N3/08
CPCG05B13/042G06N3/084G06N3/045
Inventor 李红杨国青钱广一吕攀吴朝晖
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products