Deep Q network reinforcement learning method and device for cognitive behavior model acceleration

A reinforcement learning and network technology, applied in the field of reinforcement learning, can solve the problems of lack of learning ability, weak generalization adaptability, and long training time of Agent, and achieve the effect of alleviating the influence of learning efficiency.

Pending Publication Date: 2021-10-26
NAT UNIV OF DEFENSE TECH
View PDF10 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

On the one hand, the construction of cognitive knowledge is biased towards engineering programming, with weak generalization and adaptability, and no learning ability
On the other hand, although the existing deep reinforcement learning algorithms have been successful in many applications, they still have outstanding problems such as long agent training time, large computing power requirements, and slow model convergence.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Deep Q network reinforcement learning method and device for cognitive behavior model acceleration
  • Deep Q network reinforcement learning method and device for cognitive behavior model acceleration
  • Deep Q network reinforcement learning method and device for cognitive behavior model acceleration

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0023] In order to make the purpose, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below in conjunction with specific embodiments and with reference to the accompanying drawings.

[0024] It should be noted that, unless otherwise defined, the technical terms or scientific terms used in the embodiments of the present disclosure shall have ordinary meanings understood by those skilled in the art to which the present disclosure belongs. "First", "second" and similar words used in the embodiments of the present disclosure do not indicate any sequence, quantity or importance, but are only used to distinguish different components. "Comprising" or "comprising" and similar words mean that the elements or items appearing before the word include the elements or items listed after the word and their equivalents, without excluding other elements or items. Words such as "connected" or "connected" are not lim...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a deep Q network reinforcement learning method and device for cognitive behavior model acceleration, and the method comprises the steps of obtaining state information from an environment through a cognitive behavior model, obtaining cognitive behavior knowledge according to the state information, and transmitting the cognitive behavior knowledge to a heuristic strategy network; acquiring state information from the environment by using a deep reinforcement learning model; obtaining a heuristic strategy value according to the state information and the cognitive behavior knowledge by using a heuristic strategy network, and sending the heuristic strategy value to a deep Q network; using the deep Q network to obtain and execute an action according to the state information and the heuristic strategy value; using a deep reinforcement learning model to acquire a return from the environment, and performing iterative updating on the heuristic strategy network and the deep Q network; and cyclically executing the operation, and ending reinforcement learning in response to determination of depth Q network convergence. Convergence of the deep Q network is accelerated through the cognitive behavior model and the heuristic strategy network, and the influence of huge state space and sparse reward return on learning efficiency is effectively relieved.

Description

technical field [0001] The present disclosure relates to the technical field of reinforcement learning, in particular to a deep Q-network reinforcement learning method and equipment accelerated by a cognitive behavior model. Background technique [0002] The problem of sampling efficiency (Sample Efficiency) has always restricted the application of reinforcement learning algorithms in complex problems. In reinforcement learning applications, the agent learns to interact with the environment through trial and error, so a large number of interaction samples are often required to fully explore the state-action space and converge to the optimal strategy. Especially in the face of complex tasks (such as high-dimensional, continuous state space or sparse environment rewards), the problem of low sampling efficiency of reinforcement learning agents is particularly prominent. [0003] Utilizing appropriate prior knowledge or transferring the learned policy model is an effective mean...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06N3/08G06N3/04
CPCG06N3/08G06N3/045
Inventor 黄健李嘉祥陈浩刘权张中杰付可韩润海
Owner NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products