Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Deep Value Function Learning Method for Agents Based on State Distribution Perceptual Sampling

A value function and state distribution technology, applied in the field of reinforcement learning, can solve problems such as large differences in quantity, and achieve the effect of improving learning speed, good application value, and improving sample use efficiency

Active Publication Date: 2022-04-19
ZHEJIANG UNIV
View PDF7 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, this method does not fundamentally solve two problems: 1. The importance of samples in different states is close, but the number of generated is quite different. According to what standard to sample from the empirical data set can avoid redundant samples. Oversampling; 2. Since the sample itself is very high-dimensional, huge in number and constantly generated, it is a key factor to effectively analyze a large number of high-dimensional samples. How to efficiently sample from a large number of continuously generated sample sets

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Deep Value Function Learning Method for Agents Based on State Distribution Perceptual Sampling
  • A Deep Value Function Learning Method for Agents Based on State Distribution Perceptual Sampling
  • A Deep Value Function Learning Method for Agents Based on State Distribution Perceptual Sampling

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0059] The implementation method of this embodiment is as described above, and the specific steps will not be described in detail, and the effect is only shown for case data below.

[0060] First, the hash method is used to reduce the dimensionality and classify the abstract expression of the state set observed by the agent obtained through the convolutional neural network, so as to perceive the distribution of the state space. On this basis, the samples in the empirical data set are reasonably selected. Finally, use the selected sample data to train the value function of the agent, so that it can judge the environment more accurately. The result is as figure 1 , 2 , 3 shown.

[0061] figure 1 After performing the steps S1 and S2 of the present invention for the original empirical data of the present invention, the result of visualizing the samples is a schematic diagram of the distribution of the samples in the state space;

[0062] figure 2 In order to adopt three sam...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses an agent deep value function learning method based on state distribution perception sampling, which is used for the agent to quickly learn the value function with fewer samples. It specifically includes the following steps: 1) Obtain empirical data for the agent to learn the value function, and define the algorithm goal; 2) Use convolutional neural network to preprocess the empirical data to obtain a feature set with stronger expressive ability; 3) In the In the feature space of the empirical data set, an unsupervised method is used to cluster the empirical data set; 4) According to the state distribution of the empirical data set, the sample state distribution-aware sampling method based on uniform sampling and cluster equal probability sampling interpolation is used for sampling; 5 ) The agent uses the sampled samples to learn the value function. The invention is applicable to the game game problem in the reinforcement learning field, and can achieve better results quickly under the condition of less sample size.

Description

technical field [0001] The invention belongs to the field of enhanced learning, which is a branch of the field of machine learning, and in particular relates to a sample sampling method based on state distribution perception of empirical data. Background technique [0002] Sample selection is an important issue in the field of machine learning, and different selection methods directly affect the quality of model learning. In the field of reinforcement learning, sample sampling from empirical datasets can help overcome the problem of sample correlation and forgetting earlier samples. The goal of sample sampling is to select samples from the sample set that can speed up model convergence and enhance the agent's ability to perceive the environment. The traditional method generally uses random uniform sampling to sample from the experience data set. This method is likely to cause sample imbalance and make the learning speed of the agent slower. [0003] The existing sampling m...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06N20/00G06K9/62
CPCG06F18/2321G06F18/24
Inventor 李玺李伟超皇福献
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products