Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

38 results about "Partially observable Markov decision process" patented technology

A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a probability distribution over the set of possible states, based on a set of observations and observation probabilities, and the underlying MDP.

System and method for optimizing communications using reinforcement learning

A system and method for automatically optimizing states of communications and operations in a contact center, using a reinforcement learning module comprising a reinforcement learning server and an optimization server introduced to existing infrastructure of the contact center, that, through use of a model set up as a partially observable Markov chain with a Baum-Welch algorithm used to infer parameters and rewards added to form a partially observable Markov decision process, is solved to provide an optimal action policy to use in each state of a contact center, thereby ultimately optimizing states of communications and operations for an overall return.
Owner:NEW VOICE MEDIA LIMITED

Access network service function chain deployment method based on random learning

The invention relates to an access network service function chain deployment method based on random learning, and belongs to the technical field of wireless communication. The method comprises the steps that an access network service function chain deployment scheme based on partially observable Markov decision process partial perception topology is established for the problem of high delay causedby the physical network topology change under the 5G cloud access network scene. According to the scheme, the underlying physical network topology change is perceived through the heartbeat packet observation mechanism under the 5G access network uplink condition and the complete true topology condition cannot be acquired because of the observation error so that deployment of the service functionchain deployment of the access network slice is adaptively and dynamically adjusted by using partial perception and random learning based on the partially observable Markov decision process and the delay of the slice on the access network side can be optimized. Dynamic deployment is realized by deciding the optimal service function chain deployment mode by partially perceiving the network topologychange based on the partially observable Markov decision process so that the delay can be optimized and the resource utilization rate can be enhanced.
Owner:CHONGQING UNIV OF POSTS & TELECOMM

Frequency spectrum detection method based on partially observable Markov decision process model

The invention relates to a frequency spectrum detection method based on a partially observable Markov decision process model. The frequency spectrum detection method comprises the steps that channel state information is added to a channel state historical information sequence, and time delay is estimated so that the channel state information is obtained; the channel initial belief state and state transition probability of each channel are calculated; statistical information of the channel use state and the state transition probability are acquired through observation for a period of time, and a Markov model is established for the use state of each channel; when a time slot increases, the state historical information sequence and the current time slot value are updated; instantaneous remuneration is calculated via combination of the response information update belief state according to the state transition probability of the channels; the value function of each channel after performing different behaviors is calculated; and the maximum discount return acquired by secondary users is calculated and a strategy that the discount total remuneration is the maximum bandwidth is obtained, the channels are ordered in a decreasing way according to the total remuneration of each channel, and a user is guided to try to be accessed to the channels according to the new channel order if data transmission is required.
Owner:INST OF ACOUSTICS CHINESE ACAD OF SCI

Robot optimal path planning method based on partially observable Markov decision process

The invention discloses a robot optimal path planning method based on a partially observable Markov decision process. A robot searches an optimal path to a target position, a POMDP model and a SARSOPalgorithm are considered as a basis, and a GLS search method is utilized as a heuristic condition during searching. For a continuous state large-scale observation space problem, through usage of the robot optimal path planning method the number of times of belief upper and lower bound updating is reduced than that of early classical algorithms adopting test as a heuristic condition to repeat updating of multiple similar paths, and final optimal strategy is not affected, thereby improving the algorithm efficiency; and in the same time, the robot can get a better strategy and find a better path.
Owner:SUZHOU UNIV

Preprocess method of partially observable Markov decision process based on points

InactiveCN101398914AFast convergenceOvercoming the problem of high computational complexityMathematical modelsPretreatment methodReward value
The invention provides a pre-processing method of a point-based partially observable Markov decision process. The method comprises the following steps: 1. pre-processing before iteration, which comprises: a. point set is sampled by the random interaction with the environment; b. reward function of the sampling point is computed and stored; c. pseudo inheritance point is computed and stored; and d. ending; 2. pre-processing of each step of iteration, which comprises: e. a basis vector is computed and stored; and f. ending; and 3. single-point and single-step iteration , which comprises: g. a reward value table and a candidate vector table of each sampling point are computed; h. optimal action is computed and the basis vector is obtained; i. the basis vector is corrected by an error term; and j. ending. The pre-processing method of the point-based partially observable Markov decision process pre-processes each sampled belief point, provides conception of the basis vector, avoids a mass of repeated and meaningless computations, and accelerates the algorithmic speed by 2-4 times.
Owner:NANJING UNIV

Process automatic decision-making and reasoning method and device, computer equipment and storage medium

PendingCN114647741ADynamicImprove decision-making response speedData processing applicationsNeural architecturesDecision modelDecision taking
The invention belongs to the field of deep learning, and relates to a process automatic decision-making and reasoning method and device, computer equipment and a storage medium, and the method comprises the steps: constructing a part production process knowledge base model; constructing a three-level information model of part information, process knowledge and equipment information, integrating production data and constructing a process time sequence knowledge graph; extracting process knowledge features from the process time sequence knowledge graph; splitting the production task by combining an automatic decision-making model based on the process knowledge features, extracting the spatial features of the subtasks, and retrieving the process knowledge meeting the spatial features and the time sequence requirements in the part production process knowledge base for process decision-making; a partially observable Markov decision process algorithm is adopted to define a process time sequence knowledge graph as an environment, and process reasoning is carried out. Reasoning is carried out on an unknown production process, and after a reasoning path is obtained, manual verification is carried out, and then the reasoning path is updated to the process time sequence knowledge graph, so that process knowledge is more perfect.
Owner:GUANGDONG POLYTECHNIC NORMAL UNIV

Heterogeneous Internet of Vehicles user association method based on multi-agent deep reinforcement learning

The invention discloses a heterogeneous Internet of Vehicles user association method based on multi-agent deep reinforcement learning, and the method comprises the steps: firstly modeling a problem into a partially observable Markov decision process, and then employing the idea of decomposing a team value function, and specifically comprises the steps: building a centralized training distributed execution framework, the team value function is connected with each user value function through summation so as to achieve the purpose of implicit training of the user value functions; and then, referring to experience playback and a target network mechanism, performing action exploration and selection by using an epsilon-greedy strategy, storing historical information by using a recurrent neural network, selecting a Huber loss function to calculate loss and perform gradient descent at the same time, and finally learning the association strategy of the heterogeneous Internet of Vehicles users. Compared with a multi-agent independent deep Q learning algorithm and other traditional algorithms, the method provided by the invention can more effectively improve the energy efficiency and reduce the switching overhead at the same time in a heterogeneous Internet of Vehicles environment.
Owner:NANJING UNIV OF SCI & TECH

Multi-robot collaborative navigation and obstacle avoidance method

ActiveCN113821041AGood local minimaGeneralization strategyPosition/course control in two dimensionsVehiclesAlgorithmEngineering
The invention discloses a multi-robot collaborative navigation and obstacle avoidance method. The method comprises the following steps of modeling a decision process of a robot in an unknown environment according to a partially observable Markov decision process; according to the environment modeling information of the current robot, introducing a depth deterministic strategy gradient algorithm, extracting a sampled image sample, and inputting the sampled image sample into a convolutional neural network for feature extraction; improvement being carried out on the basis of a depth deterministic strategy gradient algorithm, a long-short-term memory neural network being introduced to enable the network to have memorability, and image data being more accurate and stable by using a frame skipping mechanism; and meanwhile, an experience pool playback mechanism being modified, and a priority being set for each stored experience sample so that few and important experiences can be more applied to learning, and learning efficiency is improved; and finally, a multi-robot navigation obstacle avoidance simulation system being established. The method is advantaged in that the robot is enabled to learn navigation and obstacle avoidance from easy to difficult by adopting a curriculum type learning mode so that the training speed is accelerated.
Owner:SUN YAT SEN UNIV

Computing unloading and resource management method in edge calculation based on deep reinforcement learning

PendingCN113821346AMaximize self-interestAddress different interest pursuitsResource allocationProgram loading/initiatingEdge nodeEngineering
The invention discloses a computing unloading and resource management method in edge computing based on deep reinforcement learning, which comprises the following steps: constructing an edge computing communication model based on a partially observable Markov decision process, the edge computing communication model comprising M + N agents, the M agents being edge nodes, and the N agents being users; setting a target optimization function according to a user cost minimization target and an edge node utility maximization target; setting a time slot length, a time frame length, an initialization time slot and a time frame; enabling the edge node and the user to respectively use the partial observable Markov decision process to obtain a resource allocation strategy and a task unloading strategy; according to the task unloading strategy and the resource allocation strategy, optimizing a target optimization function by utilizing a participant-criminator model; and dividing and processing the computing task according to the optimized target optimization function. According to the invention, different interest pursues between the edge device and the user are solved, and respective interests are ensured to the maximum extent.
Owner:TIANJIN UNIV

Systems and methods for operating robots using object-oriented partially observable markov decision processes

A system and method of operating a mobile robot to perform tasks includes representing a task in an Object-Oriented Partially Observable Markov Decision Process model having at least one belief pertaining to a state and at least one observation space within an environment, wherein the state is represented in terms of classes and objects and each object has at least one attribute and a semantic label. The method further includes receiving a language command identifying a target object and a location corresponding to the target object, updating the belief associated with the target object based on the language command, driving the mobile robot to the observation space identified in the updated belief, searching the updated observation space for each instance of the target object, and providing notification upon completing the task. In an embodiment, the task is a multi-object search task.
Owner:BROWN UNIVERSITY +1

Delay tolerant network routing algorithm based on multi-agent reinforcement learning

The invention discloses a time delay tolerant network routing algorithm based on multi-agent reinforcement learning, and the algorithm is characterized in that the algorithm comprises the following steps: 1, carrying out a Louvian clustering algorithm on time delay tolerant network nodes, and providing a centralized and distributed layered architecture; 2, modeling a DTN node selection next hop problem into a distributed partially observable Markov decision process (De-POMDP) model in combination with positive social characteristics. Compared with the prior art, the technical scheme of the patent provides a layered architecture compared with an existing time delay tolerant network routing scheme based on social attributes, and social information of edge equipment can be conveniently captured; on one hand, routing decisions issued by the computing center are executed in a distributed mode, and on the other hand, routing algorithms are trained in a centralized mode at the computing center according to states transmitted by the service units. Routing forwarding in the delay tolerant network can be carried out by effectively utilizing social characteristics, so that the delivery rate is improved, and the average delay is reduced.
Owner:BEIJING UNIV OF POSTS & TELECOMM +1

Individual movement intervention infectious disease prevention and control method and system

The invention provides an individual movement intervention infectious disease prevention and control method and system. The method comprises the following steps: acquiring daily historical state information and individual relationship information of individual users in a target city within a preset time interval; and inputting the historical state information and the individual relationship information into a trained individual movement intervention infectious disease prevention and control model to obtain prevention and control intervention measures of each user individual in the target city, wherein the trained individual movement intervention infectious disease prevention and control model is obtained by training a graph neural network, a long and short term neural network and an intelligent agent according to sample user individual state information and sample individual relationship information, and the intelligent agent is constructed based on a partial observable Markov decision process; and the individual state information of the sample user comprises health state information of a hidden infected person converted into a dominant infected person. According to the invention, the number of infected people can be reduced as much as possible under low travel intervention.
Owner:TSINGHUA UNIV

Discontinuous large-bandwidth repeater frequency spectrum monitoring method

The invention discloses a discontinuous large-bandwidth transponder frequency spectrum monitoring method, which is applied to a process of monitoring a plurality of transponders of a satellite by monitoring equipment. Comprising the following steps: S1, establishing a repeater Markov model based on a partially observable Markov decision process; s2, the monitoring equipment obtains an optimal transponder selection strategy through a reinforcement learning method; s3, monitoring a plurality of transponders of the satellite according to the optimal transponder selection strategy; s4, acquiring the frequency spectrum of the transponder by a robust frequency spectrum acquisition method based on compressed sampling; and S5, the receiving processing module completes down-conversion, analog-to-digital conversion and cognitive processing in a digital domain. According to the method provided by the invention, optimal scheduling is carried out on resolution ratios such as space, time and frequency, and the effectiveness of monitoring data is improved.
Owner:NAT INNOVATION INST OF DEFENSE TECH PLA ACAD OF MILITARY SCI

Information processing apparatus, information processing method, and computer program

The invention discloses an information processing apparatus, an information processing method, and a computer program. It is possible to provide a device and a method for executing a grounding process using the POMDP (Partially Observable Markov Decision Process). The POMDP contains: observation information such as substantial information including analysis information from a language analysis unit into which a user utterance is inputted and which executes a language analysis; and task realization information from a task management unit which executes a task. Since the grounding process as a recognition process of a user request by a user utterance is executed by applying the POMDP, it is possible to effectively, rapidly, and accurately recognize a user request and execute a task based on the user request.
Owner:SONY CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products