Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A Network Optimal Tracking Control Method Based on Off-Policy q-Learning

A tracking control and network control system technology, applied in the field of network optimal tracking control based on off-policy Q-learning

Active Publication Date: 2022-02-15
LIAONING UNIVERSITY OF PETROLEUM AND CHEMICAL TECHNOLOGY
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

However, the use of off-policy Q-learning methods, compensating for packet loss, and solving the optimal tracking control problem in the case of unknown system model parameters has not been studied, which is the motivation of the present research

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A Network Optimal Tracking Control Method Based on Off-Policy q-Learning
  • A Network Optimal Tracking Control Method Based on Off-Policy q-Learning
  • A Network Optimal Tracking Control Method Based on Off-Policy q-Learning

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0026] The present invention will be described in detail below in conjunction with examples.

[0027] 1. Optimization with packet loss compensation

[0028] The invention introduces the linear quadratic tracking (LQT) problem and the model of network induced packet loss, and expounds the quadratic tracking problem of the network control system with data packet loss.

[0029] Consider the following linear discrete system

[0030]

[0031] in, is the state of the controlled object, and is dimension, is the charged input, for dimension, is the controlled output, for dimension. respectively and dimension.

[0032] The reference signal is as follows

[0033] (2)

[0034] in, is the reference input, for dimension, for dimension. In this tracking question, wanting the output in system (1) track reference input .

[0035] make , from formula (1) and formula (2), get the following augmented system

[0036] (3)

[0037] in, .

[0038]...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

A network optimal tracking control method based on non-policy Q-learning relates to a network tracking control method. The present invention proposes a new non-policy Q-based tracking control problem for networked control systems with packet loss. The learning method fully utilizes measurable data, and realizes the system to track the target in an approximately optimal way when the system model parameters are unknown and there is data loss in network communication. The invention does not require the system model parameters to be known, and uses the measurable data of the network control system to learn the optimal tracking control strategy based on the state feedback of the predictor; and the algorithm can ensure the unbiasedness of the iterative Bellman equation solution based on the Q-function. The simulation verifies the effectiveness of the proposed method.

Description

technical field [0001] The invention relates to a network tracking control method, in particular to a network optimal tracking control method based on non-policy Q-learning. Background technique [0002] Reinforcement learning is a learning method that uses "trial and error" interactions with the environment to find strategies that lead to the largest expected cumulative reward. According to whether the behavior strategy is consistent with the target strategy in the learning process, reinforcement learning can be divided into On-policy learning and Off-policy learning. If during the learning process, the behavior strategy of action selection is consistent with the target strategy of learning improvement, the method is called policy learning, otherwise it is called non-policy learning. [0003] Off-policy RL has several advantages over on-policy learning and has desirable properties: (a) It solves the exploration-exploitation dilemma. The system adopts any behavioral strate...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G05B13/04
CPCG05B13/042
Inventor 李金娜
Owner LIAONING UNIVERSITY OF PETROLEUM AND CHEMICAL TECHNOLOGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products