Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning

A technology of multi-objective optimization and reinforcement learning, applied in the field of multi-objective optimization of disordered grasping based on deep reinforcement learning, to achieve the effect of optimal selection

Active Publication Date: 2021-09-03
常州唯实智能物联创新中心有限公司
View PDF1 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0004] The reward value functions accepted by the existing Q network are all discrete, that is, the action execution results are divided into different situations according to the threshold and different rewards are given. Such reward feedback is suitable for pre-defined situations. In the process of target grasping, many factors that affect the grasping effect are continuously changing quantities, such as the grasping path, the power consumption of the robotic arm, etc. It is difficult to predict the effect of these variables in advance, so it is impossible to pre-determine the reward value for different situations how should change

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning
  • Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning
  • Disordered grabbing multi-objective optimization method and system based on deep reinforcement learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0061] Such as figure 1 As shown, this embodiment 1 provides a multi-objective optimization method for out-of-order grasping based on deep reinforcement learning. Through two parallel and independent Q-networks, and processing the same scene at the same time, the robotic arm grasps the two networks respectively. Fetch points to perform capture, and returns parameters such as the execution path and capture power consumption. The Q-network will distinguish the advantages and disadvantages of the two in terms of execution path, capture power consumption, etc., and generate corresponding reward values. The Q network accepts both internal and external reward function feedback, which solves the problem that the reward value function of a single Q network can only be discrete data, and adds continuous data such as execution path and power consumption to the reward value function to further optimize Selection of grab points.

[0062] Specifically, the multi-objective optimization me...

Embodiment 2

[0102] see figure 2 , this embodiment provides a disordered grasping multi-objective optimization system based on deep reinforcement learning, the system includes: a virtual scene construction module, a task establishment module, a virtual shooting module, an output module, an execution module, a calculation module, a feedback module and the predictive model generation module.

[0103] The virtual scene building module is suitable for constructing a virtual scene in which a robotic arm grabs multiple objects.

[0104] The task establishment module is suitable for establishing two parallel and independent deep reinforcement learning networks to handle the task of grabbing multiple targets out of order. Specifically, the task building module is used to perform the following steps:

[0105] S121: Establish two parallel and independent deep reinforcement learning networks, namely a first network and a second network, wherein the first network and the second network have the sam...

Embodiment 3

[0136] This embodiment provides a computer-readable storage medium, and at least one instruction is stored in the computer-readable storage medium. When the above-mentioned instruction is executed by a processor, the disorder based on deep reinforcement learning provided by Embodiment 1 is realized. Catch multi-objective optimization methods.

[0137] The multi-objective optimization method for out-of-order grasping based on deep reinforcement learning uses two parallel and independent Q-networks to process the same scene at the same time. The robotic arm performs grasping on the respective grasping points of the two networks and returns the execution path. Capture parameters such as power consumption. The Q-network will distinguish the advantages and disadvantages of the two in terms of execution path, capture power consumption, etc., and generate corresponding reward values. The Q network accepts both internal and external reward function feedback, which solves the problem ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention belongs to the field of disordered grabbing of mechanical arms, and particularly relates to a disordered grabbing multi-objective optimization method and system based on deep reinforcement learning, the disordered grabbing multi-objective optimization method based on deep reinforcement learning processes the same scene at the same moment through two parallel and independent Q networks, and the mechanical arm grabs the respective grabbing points of the two networks and returns parameters such as an execution path and grabbing power consumption. Between the Q networks, advantages and disadvantages of the two are distinguished about capture effects of an execution path, capture power consumption and the like, and corresponding reward values are generated. The Q network receives internal and external reward function feedback, the problem that a reward value function of a single Q network can only be discrete data is solved, and continuous data such as an execution path and grabbing power consumption are added into the reward value function, so that selection of grabbing points is further optimized.

Description

technical field [0001] The invention relates to the field of disordered grasping of a robotic arm, in particular to a multi-objective optimization method and system for disordered grasping based on deep reinforcement learning. Background technique [0002] With the development of robot technology, the application scenarios of the existing robot disordered grasping technology continue to expand, and the reinforcement learning method based on the grasping success rate as the network training target cannot effectively meet the multi-index differentiation of robots' disordered grasping in different application scenarios need. The efficient multi-objective optimization of the robot's disorderly grasping behavior has important practical significance for improving the robot's customized work ability and expanding the robot's application scenarios. [0003] The deep reinforcement learning algorithm has obvious intelligence and robustness, based on the feedback of the environment, t...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
IPC IPC(8): G06Q10/04G06Q10/06G06K9/00G06N3/08G06N3/04
CPCG06Q10/04G06Q10/067G06N3/08G06N3/045
Inventor 肖利民张华梁何智涛秦广军韩萌杨钰杰王良孙锦涛
Owner 常州唯实智能物联创新中心有限公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products