Underwater vehicle target area floating control method based on double-commentator reinforcement learning technology

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
An underwater vehicle and target area technology, which is applied to neural learning methods, underwater ships, underwater operation equipment, etc., can solve the problem of the increase in the number of Q values, the slow convergence speed of algorithm training, easy acquisition without consideration, and reliable performance Expert data and other issues, to achieve good control effect, fast convergence effect

Active Publication Date: 2021-06-25

SHANDONG UNIV

View PDF5 Cites 5 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

[0005] However, existing research and inventions based on traditional RL for underwater vehicle control have some significant defects: First, based on traditional reinforcement learning algorithms such as Q-learning, it is necessary to construct a huge Q-value table to store high The Q(s, a) value exists in the latitude action space and state space, and with the continuous training of the agent in the high-dimensional action and state space, the number of Q values in the Q value table will explode. makes this method very limited

Then, with the combination of deep learning and traditional Q-learning technology proposed by the Google Deepmind team, the deep reinforcement learning algorithm DRL (Deep reinforcement learning) algorithm was born. In this algorithm, the Q value table is replaced by the neural network and becomes DQN (Deep Q Net) (V.Mnih et al., "Human-level control through deep reinforcement learning," Nature, vol.518, pp.529-533, 2015.), but the DQN algorithm is only suitable for discrete action spaces, which restricts its application to Intelligent control of underwater vehicles; DDPG (Deep Deterministic Policy Gradient) (Lillicrap T P, Hunt J J, Pritzel A, et al. Continuous control with deep inforcement learning [J]. Computer ence, 2015, 8(6): A187.) It is a control algorithm suitable for continuous action space, but the Q(s, a) output by the critic network comes from the expectation of the action-value function, which leads to the disadvantage of overestimation

Moreover, the above RL method does not consider expert data that is easy to obtain and has reliable performance, which leads to the slow convergence speed of the algorithm in training, and there is a lot of randomness in the early stage of training

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment 1

[0098] A method for controlling the floating of an underwater vehicle target area based on double-critician reinforcement learning technology. The implementation process of the present invention is divided into two parts, the task environment construction stage and the floating strategy training stage, including the following steps:

[0099] 1. Define the task environment and model:

[0100] 1-1. Construct the task environment of the target area where the underwater vehicle is located and the dynamic model of the underwater vehicle;

[0101] Using the python language to write the underwater vehicle simulation environment task environment in the vscode integrated compilation environment, the geographic coordinate system E-ξηζ of the constructed simulated pool map is as follows image 3 As shown, the size of the three-dimensional pool is set to 50 meters * 50 meters * 50 meters, and the successful floating area of the target area is a cylindrical area with the center of the wa...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The invention relates to an underwater vehicle target area floating control method based on a double-commentator reinforcement learning technology, and belongs to the technical field of ocean control experiments. Based on a DDPG algorithm framework in deep reinforcement learning, in the underwater vehicle intelligent agent training process, previously obtained expert data are used, and the interaction data obtained by interaction of the intelligent agent and the task environment is used, so that the algorithm convergence speed is greatly improved through mixed collection of the two. Meanwhile, two groups of mutually independent commentator networks are utilized, the loss function of the actor network is obtained by respectively outputting the minimum values of Q (s, a) through the two groups of mutually independent commentator networks, and over-bias estimation existing in a reinforcement learning algorithm is effectively reduced.

Description

technical field [0001] The invention relates to a method for controlling the floating of a target area of an underwater vehicle based on double-critician reinforcement learning technology, and belongs to the technical field of ocean control experiments. Background technique [0002] As a key marine equipment, underwater vehicles are widely used in many scientific research and engineering fields such as ocean topographic mapping, resource exploration, archaeological investigation, pipeline maintenance, biological monitoring, etc., and are an important means for human beings to explore the ocean. However, the seabed environment is complex and changeable. Underwater vehicles working in such an environment will inevitably lead to economic losses and loss of important data if they fail to float up to the area where the mother ship is located in a timely, safe and intelligent manner when encountering a fault or strong interference. . Therefore, in order to enhance the adaptabil...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F30/28G06N3/04G06N3/08B63G8/18B63G8/14

CPCG06N3/08G06F30/28B63G8/14B63G8/18G06N3/045

Inventor 李沂滨张天泽缪旭弘魏征尤岳周广礼贾磊庄英豪宋艳

Owner SHANDONG UNIV

Who we serve

R&D Engineer
R&D Manager
IP Professional

Why Patsnap Eureka

Industry Leading Data Capabilities
Powerful AI technology
Patent DNA Extraction

Social media

Patsnap Eureka Blog

Learn More

PatSnap group products

Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.

Underwater vehicle target area floating control method based on double-commentator reinforcement learning technology

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment 1

PUM

Abstract

Description

Claims

Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology