Method for improving continuous control stability of intelligent agent

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An intelligent body and stability technology, applied in the direction of instruments, computing models, artificial life, etc., can solve problems such as difficult modeling, inability to enhance robustness, inability to know the difference between new environment and training environment, etc., to achieve enhanced robustness , the effect of improving stability

Pending Publication Date: 2022-03-15

UNIV OF SCI & TECH OF CHINA

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0006] However, in many real scene tasks, it is often difficult to predict the specific form of the disturbance in advance, and in some tasks it cannot even be assumed that the disturbance follows a certain form

For example, when a robot that needs to be controlled and trained performs tasks in a new environment, it may not be possible to know the differences between the new environment and the training environment; when the difference in the environment comes from complex air flow or random errors in the manufacturing process of mechanical materials, It is even difficult to effectively model these perturbations

In other words, in many task scenarios, it is not possible to enhance the robustness of the trained strategy by introducing the above RMDP-based method in the training environment

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0088] The embodiment of the present invention provides a method for improving the stability of the continuous control of the agent, which can be called the conservative state strategy optimization method (SC-SAC), which is based on the existing conservative state Markov decision process and the corresponding conservative state strategy iteration and Based on the value iteration algorithm, a new robust reinforcement learning method is formed, which includes two parts: the conservative state policy evaluation network and the conservative state policy promotion network, and uses the gradient regularization term to efficiently solve its optimization goal.

[0089] Among them, the conservative state policy evaluation network is used as the scoring network of the agent, and the following objective function (1) is minimized to train the scoring function Q(critic):

[0090]

[0091] in is defined as:

[0092]

[0093]

[0094] The conservative state policy boosting network...

Embodiment 2

[0124] This embodiment provides a method for improving the stability of the continuous control of the agent, including:

[0125] Preparation Phase:

[0126] Select the current task to be tested and split it into two parts: the agent and the environment. The agent accepts the state selection action of the environment feedback; the environment accepts the action to determine the new state, and then abstracts the actions that the agent can perform. And set the rewards accepted by the agent. Maximizing the cumulative value of rewards is the goal of the present invention.

[0127] Training phase:

[0128] Use the deep learning method to implement the above pseudo-code, deploy it in the agent, the agent follows the corresponding process in the pseudo-code to interact with the training environment, and use the data collected interactively for the training of the strategy, repeating a period of time train;

[0129] Verification phase:

[0130] Deploy the learning strategy obtaine...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a method for improving the continuous control stability of an intelligent agent, and the method comprises the steps: 1, enabling the intelligent agent to interact with an environment according to a preset continuous control task, collecting interaction data, and enabling the interaction data to serve as training data to be put into an experience playback pool; step 2, training the conservative state scoring network in a mode of taking a minimum value from an objective function of the conservative state strategy evaluation module; step 3, calculating an objective function of the conservative state strategy promotion module in combination with the trained conservative state scoring network, and training a strategy function of the conservative state strategy network by taking a maximum value until a strategy with the strongest stability is obtained; and step 4, the intelligent agent executes subsequent continuous control by using the strategy with the strongest stability. According to the method, the robustness of the strategy obtained by agent reinforcement learning training during migration from a training environment to a real environment is effectively enhanced, and the stability of agent continuous control is improved.

Description

technical field [0001] The invention relates to the field of continuous control of intelligent bodies, in particular to a method for improving the stability of continuous control of intelligent bodies. Background technique [0002] Deep reinforcement learning algorithms have achieved great success in areas such as robot control and game intelligence. In tasks represented by simulated robot control, it is usually necessary to deal with continuous state and action spaces, so this type of problem is usually abstracted as a continuous control problem in agent reinforcement learning. Taking the classic reinforcement learning algorithm soft actor-critic algorithm (soft actor-critic, SAC algorithm) in the continuous control problem as an example, it effectively enhances the sample reuse and The training stability problem is applied to a series of simulation robot control tasks represented by the MuJoCo simulation environment, and excellent performance has been achieved in multiple...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06F30/27G06N3/00

CPCG06F30/27G06N3/008

Inventor 王杰匡宇飞周祺周文罡

Owner UNIV OF SCI & TECH OF CHINA

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Method for improving continuous control stability of intelligent agent

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

Embodiment 2

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology