Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

Underwater vehicle target area floating control method based on double-commentator reinforcement learning technology

An underwater vehicle and target area technology, which is applied to neural learning methods, underwater ships, underwater operation equipment, etc., can solve the problem of the increase in the number of Q values, the slow convergence speed of algorithm training, easy acquisition without consideration, and reliable performance Expert data and other issues, to achieve good control effect, fast convergence effect

Active Publication Date: 2021-06-25
SHANDONG UNIV
View PDF5 Cites 5 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] However, existing research and inventions based on traditional RL for underwater vehicle control have some significant defects: First, based on traditional reinforcement learning algorithms such as Q-learning, it is necessary to construct a huge Q-value table to store high The Q(s, a) value exists in the latitude action space and state space, and with the continuous training of the agent in the high-dimensional action and state space, the number of Q values ​​in the Q value table will explode. makes this method very limited
Then, with the combination of deep learning and traditional Q-learning technology proposed by the Google Deepmind team, the deep reinforcement learning algorithm DRL (Deep reinforcement learning) algorithm was born. In this algorithm, the Q value table is replaced by the neural network and becomes DQN (Deep Q Net) (V.Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol.518, pp.529-533, 2015.), but the DQN algorithm is only suitable for discrete action spaces, which restricts its application to Intelligent control of underwater vehicles; DDPG (Deep Deterministic Policy Gradient) (Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep inforcement learning [J]. Computer ence, 2015, 8(6): A187.) It is a control algorithm suitable for continuous action space, but the Q(s, a) output by the critic network comes from the expectation of the action-value function, which leads to the disadvantage of overestimation
Moreover, the above RL method does not consider expert data that is easy to obtain and has reliable performance, which leads to the slow convergence speed of the algorithm in training, and there is a lot of randomness in the early stage of training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Underwater vehicle target area floating control method based on double-commentator reinforcement learning technology
  • Underwater vehicle target area floating control method based on double-commentator reinforcement learning technology
  • Underwater vehicle target area floating control method based on double-commentator reinforcement learning technology

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0098] A method for controlling the floating of an underwater vehicle target area based on double-critician reinforcement learning technology. The implementation process of the present invention is divided into two parts, the task environment construction stage and the floating strategy training stage, including the following steps:

[0099] 1. Define the task environment and model:

[0100] 1-1. Construct the task environment of the target area where the underwater vehicle is located and the dynamic model of the underwater vehicle;

[0101] Using the python language to write the underwater vehicle simulation environment task environment in the vscode integrated compilation environment, the geographic coordinate system E-ξηζ of the constructed simulated pool map is as follows image 3 As shown, the size of the three-dimensional pool is set to 50 meters * 50 meters * 50 meters, and the successful floating area of ​​the target area is a cylindrical area with the center of the wa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention relates to an underwater vehicle target area floating control method based on a double-commentator reinforcement learning technology, and belongs to the technical field of ocean control experiments. Based on a DDPG algorithm framework in deep reinforcement learning, in the underwater vehicle intelligent agent training process, previously obtained expert data are used, and the interaction data obtained by interaction of the intelligent agent and the task environment is used, so that the algorithm convergence speed is greatly improved through mixed collection of the two. Meanwhile, two groups of mutually independent commentator networks are utilized, the loss function of the actor network is obtained by respectively outputting the minimum values of Q (s, a) through the two groups of mutually independent commentator networks, and over-bias estimation existing in a reinforcement learning algorithm is effectively reduced.

Description

technical field [0001] The invention relates to a method for controlling the floating of a target area of ​​an underwater vehicle based on double-critician reinforcement learning technology, and belongs to the technical field of ocean control experiments. Background technique [0002] As a key marine equipment, underwater vehicles are widely used in many scientific research and engineering fields such as ocean topographic mapping, resource exploration, archaeological investigation, pipeline maintenance, biological monitoring, etc., and are an important means for human beings to explore the ocean. However, the seabed environment is complex and changeable. Underwater vehicles working in such an environment will inevitably lead to economic losses and loss of important data if they fail to float up to the area where the mother ship is located in a timely, safe and intelligent manner when encountering a fault or strong interference. . Therefore, in order to enhance the adaptabil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06F30/28G06N3/04G06N3/08B63G8/18B63G8/14
CPCG06N3/08G06F30/28B63G8/14B63G8/18G06N3/045
Inventor 李沂滨张天泽缪旭弘魏征尤岳周广礼贾磊庄英豪宋艳
Owner SHANDONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products