Policy migration method based on probability

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A strategy and probability technology, applied in the field of strategy transfer based on probability, can solve the problems that it is difficult for the virtual environment to approach the real environment, restrict the learning efficiency of the strategy and the performance of the strategy, and achieve the effect of extensive algorithm applicability and extensive engineering applicability.

Pending Publication Date: 2022-07-22

BEIJING INST OF CONTROL ENG

View PDF0 Cites 0 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

The environment of such tasks is affected by high dynamics and uncertainties. In fact, it is difficult to use the virtual environment to approach the real environment, which restricts the further improvement of policy learning efficiency and policy operation performance.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment Construction

[0053] In order to better understand the above technical solutions, the technical solutions of the present application will be described in detail below through the accompanying drawings and specific embodiments. It is not a limitation on the technical solutions of the present application, and the embodiments of the present application and the technical features in the embodiments may be combined with each other under the condition of no conflict.

[0054] A probability-based policy migration method provided by the embodiments of the present application will be described in further detail below with reference to the accompanying drawings. Specific implementations may include (for example, Figures 1 to 3 shown):

[0055] Step 1, virtual environment training data collection. The virtual environment S refers to the establishment of a generative model that can generate data: the system state is defined as s, the reward at this moment is r, and the state at the next moment is s';...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention discloses a probability-based strategy migration method, and belongs to the technical field of artificial intelligence. The environment of continuous control tasks such as robot operation is affected by high dynamic and uncertainty, and actually, it is difficult to use a virtual environment to approach a real environment. The method comprises the following steps: constructing a Q function estimator of a probability through Monte Carlo dropout, and combining the Q function estimator with strategy gradient optimization, so that the algorithm has the capability of identifying environmental uncertainty. Specifically, through virtual environment training data acquisition, uncertainty decomposition and inference, strategy gradient optimization and real environment operation performance evaluation, decomposition and measurement of environment uncertainty are realized, and strategy learning efficiency and strategy operation performance are improved.

Description

technical field [0001] The invention relates to a probability-based strategy migration method, which belongs to the technical field of artificial intelligence. Background technique [0002] The poor performance of virtual-real policy transfer is an important factor restricting the in-depth application of reinforcement learning. For general continuous control learning problems, a common solution is to learn and train in a virtual environment, and transfer the trained policy network to the real environment at the cost of zero or few samples, involving two environments. For problems such as space robot operation, due to the need for a large number of ground reliability tests, at least three environments are required for migration, namely virtual environment, ground test environment, and real space environment. The environment of such tasks is affected by high dynamics and uncertainty. In fact, it is difficult to use the virtual environment to approach the real environment, whi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

Patent Type & Authority Applications(China)

IPC IPC(8): G06N20/00G06N3/08

CPCG06N20/00G06N3/08

Inventor 解永春李林峰王勇陈奥

Owner BEIJING INST OF CONTROL ENG

Features

R&D
Intellectual Property
Life Sciences
Materials
Tech Scout

Why Patsnap Eureka

Unparalleled Data Quality
Higher Quality Content
60% Fewer Hallucinations

Social media

Patsnap Eureka Blog

Learn More

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Policy migration method based on probability

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment Construction

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology