Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Method for improving continuous control stability of intelligent agent

An intelligent body and stability technology, applied in the direction of instruments, computing models, artificial life, etc., can solve problems such as difficult modeling, inability to enhance robustness, inability to know the difference between new environment and training environment, etc., to achieve enhanced robustness , the effect of improving stability

Pending Publication Date: 2022-03-15
UNIV OF SCI & TECH OF CHINA
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] However, in many real scene tasks, it is often difficult to predict the specific form of the disturbance in advance, and in some tasks it cannot even be assumed that the disturbance follows a certain form
For example, when a robot that needs to be controlled and trained performs tasks in a new environment, it may not be possible to know the differences between the new environment and the training environment; when the difference in the environment comes from complex air flow or random errors in the manufacturing process of mechanical materials, It is even difficult to effectively model these perturbations
In other words, in many task scenarios, it is not possible to enhance the robustness of the trained strategy by introducing the above RMDP-based method in the training environment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for improving continuous control stability of intelligent agent
  • Method for improving continuous control stability of intelligent agent
  • Method for improving continuous control stability of intelligent agent

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0088] The embodiment of the present invention provides a method for improving the stability of the continuous control of the agent, which can be called the conservative state strategy optimization method (SC-SAC), which is based on the existing conservative state Markov decision process and the corresponding conservative state strategy iteration and Based on the value iteration algorithm, a new robust reinforcement learning method is formed, which includes two parts: the conservative state policy evaluation network and the conservative state policy promotion network, and uses the gradient regularization term to efficiently solve its optimization goal.

[0089] Among them, the conservative state policy evaluation network is used as the scoring network of the agent, and the following objective function (1) is minimized to train the scoring function Q(critic):

[0090]

[0091] in is defined as:

[0092]

[0093]

[0094] The conservative state policy boosting network...

Embodiment 2

[0124] This embodiment provides a method for improving the stability of the continuous control of the agent, including:

[0125] Preparation Phase:

[0126] Select the current task to be tested and split it into two parts: the agent and the environment. The agent accepts the state selection action of the environment feedback; the environment accepts the action to determine the new state, and then abstracts the actions that the agent can perform. And set the rewards accepted by the agent. Maximizing the cumulative value of rewards is the goal of the present invention.

[0127] Training phase:

[0128] Use the deep learning method to implement the above pseudo-code, deploy it in the agent, the agent follows the corresponding process in the pseudo-code to interact with the training environment, and use the data collected interactively for the training of the strategy, repeating a period of time train;

[0129] Verification phase:

[0130] Deploy the learning strategy obtaine...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a method for improving the continuous control stability of an intelligent agent, and the method comprises the steps: 1, enabling the intelligent agent to interact with an environment according to a preset continuous control task, collecting interaction data, and enabling the interaction data to serve as training data to be put into an experience playback pool; step 2, training the conservative state scoring network in a mode of taking a minimum value from an objective function of the conservative state strategy evaluation module; step 3, calculating an objective function of the conservative state strategy promotion module in combination with the trained conservative state scoring network, and training a strategy function of the conservative state strategy network by taking a maximum value until a strategy with the strongest stability is obtained; and step 4, the intelligent agent executes subsequent continuous control by using the strategy with the strongest stability. According to the method, the robustness of the strategy obtained by agent reinforcement learning training during migration from a training environment to a real environment is effectively enhanced, and the stability of agent continuous control is improved.

Description

technical field [0001] The invention relates to the field of continuous control of intelligent bodies, in particular to a method for improving the stability of continuous control of intelligent bodies. Background technique [0002] Deep reinforcement learning algorithms have achieved great success in areas such as robot control and game intelligence. In tasks represented by simulated robot control, it is usually necessary to deal with continuous state and action spaces, so this type of problem is usually abstracted as a continuous control problem in agent reinforcement learning. Taking the classic reinforcement learning algorithm soft actor-critic algorithm (soft actor-critic, SAC algorithm) in the continuous control problem as an example, it effectively enhances the sample reuse and The training stability problem is applied to a series of simulation robot control tasks represented by the MuJoCo simulation environment, and excellent performance has been achieved in multiple...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F30/27G06N3/00
CPCG06F30/27G06N3/008
Inventor 王杰匡宇飞周祺周文罡
Owner UNIV OF SCI & TECH OF CHINA
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products