Policy migration method based on probability

A strategy and probability technology, applied in the field of strategy transfer based on probability, can solve the problems that it is difficult for the virtual environment to approach the real environment, restrict the learning efficiency of the strategy and the performance of the strategy, and achieve the effect of extensive algorithm applicability and extensive engineering applicability.

Pending Publication Date: 2022-07-22
BEIJING INST OF CONTROL ENG
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The environment of such tasks is affected by high dynamics and uncertainties. In fact, it is difficult to use the virtual environment to approach the real environment, which restricts the further improvement of policy learning efficiency and policy operation performance.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Policy migration method based on probability
  • Policy migration method based on probability
  • Policy migration method based on probability

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0053] In order to better understand the above technical solutions, the technical solutions of the present application will be described in detail below through the accompanying drawings and specific embodiments. It is not a limitation on the technical solutions of the present application, and the embodiments of the present application and the technical features in the embodiments may be combined with each other under the condition of no conflict.

[0054] A probability-based policy migration method provided by the embodiments of the present application will be described in further detail below with reference to the accompanying drawings. Specific implementations may include (for example, Figures 1 to 3 shown):

[0055] Step 1, virtual environment training data collection. The virtual environment S refers to the establishment of a generative model that can generate data: the system state is defined as s, the reward at this moment is r, and the state at the next moment is s';...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a probability-based strategy migration method, and belongs to the technical field of artificial intelligence. The environment of continuous control tasks such as robot operation is affected by high dynamic and uncertainty, and actually, it is difficult to use a virtual environment to approach a real environment. The method comprises the following steps: constructing a Q function estimator of a probability through Monte Carlo dropout, and combining the Q function estimator with strategy gradient optimization, so that the algorithm has the capability of identifying environmental uncertainty. Specifically, through virtual environment training data acquisition, uncertainty decomposition and inference, strategy gradient optimization and real environment operation performance evaluation, decomposition and measurement of environment uncertainty are realized, and strategy learning efficiency and strategy operation performance are improved.

Description

technical field [0001] The invention relates to a probability-based strategy migration method, which belongs to the technical field of artificial intelligence. Background technique [0002] The poor performance of virtual-real policy transfer is an important factor restricting the in-depth application of reinforcement learning. For general continuous control learning problems, a common solution is to learn and train in a virtual environment, and transfer the trained policy network to the real environment at the cost of zero or few samples, involving two environments. For problems such as space robot operation, due to the need for a large number of ground reliability tests, at least three environments are required for migration, namely virtual environment, ground test environment, and real space environment. The environment of such tasks is affected by high dynamics and uncertainty. In fact, it is difficult to use the virtual environment to approach the real environment, whi...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06N20/00G06N3/08
CPCG06N20/00G06N3/08
Inventor 解永春李林峰王勇陈奥
Owner BEIJING INST OF CONTROL ENG
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products