Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Reinforcement learning model optimization method and device based on parameter noise

A technology of reinforcement learning and optimization methods, applied in the field of reinforcement learning model optimization based on parameter noise, can solve problems such as low recognition accuracy and slow convergence speed, and achieve the effects of accelerating model convergence, accelerating convergence speed, and improving exploration efficiency.

Inactive Publication Date: 2019-04-19
SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
View PDF3 Cites 3 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0007] The main purpose of the present invention is to propose a parameter noise-based reinforcement learning model optimization method and device to overcome the problems of low recognition accuracy and slow convergence in existing value-based and policy-based methods

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement learning model optimization method and device based on parameter noise
  • Reinforcement learning model optimization method and device based on parameter noise
  • Reinforcement learning model optimization method and device based on parameter noise

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0029] The present invention will be further described below in conjunction with the accompanying drawings and preferred embodiments.

[0030] It has been introduced above that the present invention can be widely used in unmanned driving, mechanical arm control and other aspects in the industry. However, for the convenience of verification, the model effect of a reinforcement learning model optimization method based on parameter noise proposed in this embodiment is verified on a variety of game simulators. The game simulator website can be found at http: / / www.mujoco.org / , the simulation results show that this embodiment has achieved better results than previous ones. The reason why the game simulator is used for verification is that the verification cost is low, variables are easy to control, and the verification effect is better. The simulator verification is actually a reliability evaluation of the actual system construction.

[0031] The estimation method includes the fo...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention provides a reinforcement learning model optimization method based on parameter noise. The parameter noise of periodic Gaussian distribution is added to the parameters of the model in a heuristic mode, the exploration range of unknown decisions in the model training process is enlarged, more excellent decisions can be found more quickly in the model training process, the convergence speed is increased accordingly, and an optimal model is obtained through faster training. Therefore, by adopting the reinforcement learning model optimization method, the model exploration efficiency can be improved, and model convergence can be accelerated. The method comprises the following steps: S1, acquiring a group of data from a game simulator, and inputting the data into a model; S2, the model makes a decision on simulator scene data; S3, adding various noises with different Gaussian distributions to the decision model parameters; S4, making a plurality of different decision-making behaviors through S3; S5, calculating loss for multiple different behaviors in the step S4, and carrying out averaging to update model parameters; S6, circulating S1;And S5, obtaining an optimal model until the loss is not reduced.

Description

technical field [0001] The present invention relates to the field of artificial intelligence deep reinforcement learning, in particular to a parameter noise-based reinforcement learning model optimization method and device. Background technique [0002] Reinforcement learning or Reinforcement Learning (Reinforcement Learning) has a wide range of applications and rich application prospects. The Google DeepMind team implemented a reinforcement learning model that surpasses humans in the field of Go; the industry also has extremely high application value in unmanned driving and robotic arm control. [0003] Reinforcement learning is to learn how to make corresponding behaviors according to the specific environment, so as to achieve the purpose of obtaining the maximum reward. Unlike supervised learning, reinforcement learning has no labels, but by learning the returns obtained from exploring unknown behaviors and discovering (exploitation) historical behaviors in a specific en...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06N20/00
Inventor 王好谦李达张永兵戴琼海
Owner SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products