Robot Plume Tracking Method Based on Reinforcement Learning in Continuous State Behavior Domain

A continuous state and reinforcement learning technology, applied in neural learning methods, instruments, manipulators, etc., can solve problems such as high cost, inability to adapt to the environment, and insufficient consideration

Active Publication Date: 2019-09-27
TSINGHUA UNIV
View PDF5 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

First of all, there are few judgments, only to make corresponding actions based on whether the concentration is detected and the time when the signal is lost, without using other available information, such as the size and direction of the flow field, the position of the loss, and its own actions when it is lost, etc., which is not considered comprehensive enough
Secondly, the action of the robot in the plume flow is relatively simple. The simple direction of the upstream flow or the direction of a fixed angle deflection with the upstream flow field works well in the narrow, long and straight plume flow, but it is difficult to deal with the complex plume flow in the deep sea.
Finally, the hydrothermal recovery mechanism is a fixed set of search actions. The search range is manually set in advance and cannot adapt to the current environment, resulting in high costs.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Robot Plume Tracking Method Based on Reinforcement Learning in Continuous State Behavior Domain
  • Robot Plume Tracking Method Based on Reinforcement Learning in Continuous State Behavior Domain
  • Robot Plume Tracking Method Based on Reinforcement Learning in Continuous State Behavior Domain

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0082] The robot plume tracking method based on reinforcement learning in the continuous state behavior domain proposed by the present invention will be further described in detail below in conjunction with the accompanying drawings and specific embodiments.

[0083] The robot plume tracking method based on continuous state behavior domain reinforcement learning proposed by the present invention describes the robot plume tracking process by using sequence decision-making. At the initial moment, the underwater robot will obtain the current deep-sea environmental information, including signals detectable by sensors such as the current velocity in the deep sea and the concentration of hydrothermal plume signals. Combine these signals into the state vector required for single-step path planning, input the state vector into the current decision-making neural network, and the decision-making neural network outputs the direction of the robot at this moment. After the robot runs at a c...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention proposes a robot plume tracking method based on continuous state behavior domain reinforcement learning, which belongs to the field of underwater robot path planning. This method trains the path planning for the underwater robot to search for plume hydrothermal vents; the robot generates a state vector to input the current decision-making neural network in each plume tracking, and the neural network outputs the forward direction of the robot at this moment, and the robot moves at a constant speed. After running for a period of time, update the state vector at the new moment and judge whether the single plume tracking meets the termination condition: when the termination condition is met, the single plume tracking ends, and the robot regenerates a new initial position; if not Satisfied, the robot will continue to move forward at the next moment; in this process, the reinforcement learning algorithm is used to update the decision-making neural network at each moment until the algorithm converges. The invention has fast learning speed and good convergence, can improve the flexibility of the robot to track the plume flow hydrothermal nozzle, and reduce the search cost.

Description

technical field [0001] The invention belongs to the field of underwater robot path planning, in particular to a robot plume tracking method based on continuous state behavior domain reinforcement learning. Background technique [0002] Deep-sea hydrothermal activities and their life phenomena are one of the major discoveries in marine science in the 20th century. Since deep-sea hydrothermal vents are closely related to seafloor spreading and polymetallic sulfide mineralization, and involve cutting-edge scientific issues such as the evolution of biological communities in hydrothermal environments, and the impact of hydrothermal activities on global climate change, etc. The study of deep-sea hydrothermal fluids has become a hot topic in ocean research. [0003] In order to further study deep-sea hydrothermal fluids, it is necessary to explore the location of unknown hydrothermal vents in the deep sea. Researchers have found that deep-sea hydrothermal vents will emit hydrothe...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06K9/66G06N3/08B25J9/16
CPCG06N3/08B25J9/1664G06V30/194
Inventor 宋士吉牛绿茵
Owner TSINGHUA UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products