Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

191 results about "Action selection" patented technology

Action selection is a way of characterizing the most basic problem of intelligent systems: what to do next. In artificial intelligence and computational cognitive science, "the action selection problem" is typically associated with intelligent agents and animats—artificial systems that exhibit complex behaviour in an agent environment. The term is also sometimes used in ethology or animal behavior.

Automatic driving system based on enhanced learning and multi-sensor fusion

The invention discloses an automatic driving system based on enhanced learning and multi-sensor fusion. The system comprises a perception system, a control system and an execution system. The perception system high-efficiently processes a laser radar, a camera and a GPS navigator through a deep learning network so as to realize real time identification and understanding of vehicles, pedestrians, lane lines, traffic signs and signal lamps surrounding a running vehicle. Through an enhanced learning technology, the laser radar and a panorama image are matched and fused so as to form a real-time three-dimensional streetscape map and determination of a driving area. The GPS navigator is combined to realize real-time navigation. The control system adopts an enhanced learning network to process information collected by the perception system, and the people, vehicles and objects of the surrounding vehicles are predicted. According to vehicle body state data, the records of driver actions are paired, a current optimal action selection is made, and the execution system is used to complete execution motion. In the invention, laser radar data and a video are fused, and driving area identification and destination path optimal programming are performed.
Owner:清华大学苏州汽车研究院(吴江)

Controller

A controller is provided, operable to control a system on the basis of measurement data received from a plurality of sensors indicative of a state of the system, with at least partial autonomy, but in environments in which it is not possible to fully determine the state of the system on the basis of such sensor measurement data. The controller, comprises: a system model, defining at least a set of probabilities for the dynamical evolution of the system and corresponding measurement models for the plurality of sensors of the system; a stochastic estimator operable to receive measurement data from the sensors and, with reference to the system model, to generate a plurality of samples each representative of the state of the system; a rule set corresponding to the system model, defining, for each of a plurality of possible samples representing possible states of the system, information defining an action to be carried out in the system; and an action selector, operable to receive an output of the stochastic estimator and to select, with reference to the rule set, information defining one or more corresponding actions to be performed in the system.
Owner:BEAS SYST INC

Dynamic spectrum access method based on policy planning constrain Q study

The invention provides a dynamic spectrum access method on the basis that the policy planning restricts Q learning, which comprises the following steps: cognitive users can divide the frequency spectrum state space, and select out the reasonable and legal state space; the state space can be ranked and modularized; each ranked module can finish the Q form initialization operation before finishing the Q learning; each module can individually execute the Q learning algorithm; the algorithm can be selected according to the learning rule and actions; the actions finally adopted by the cognitive users can be obtained by making the strategic decisions by comprehensively considering all the learning modules; whether the selected access frequency spectrum is in conflict with the authorized users is determined; if so, the collision probability is worked out; otherwise, the next step is executed; whether an environmental policy planning knowledge base is changed is determined; if so, the environmental policy planning knowledge base is updated, and the learning Q value is adjusted; the above part steps are repeatedly executed till the learning convergence. The method can improve the whole system performance, and overcome the learning blindness of the intelligent body, enhance the learning efficiency, and speed up the convergence speed.
Owner:COMM ENG COLLEGE SCI & ENGINEEIRNG UNIV PLA

Medical interrogation dialogue system and reinforcement learning method applied to medical interrogation dialogue system

The invention discloses a medical interrogation dialogue system and a reinforcement learning method applied to the medical interrogation dialogue system, and relates to the technical field of medicalinformation. The system comprises a natural language understanding module used for classifying the intentions of users and filling slot values to form structured semantic frames; a dialogue managementmodule used for interacting with a user through a robot agent, inputting a dialogue state, performing action decision on the semantic frame through a decision network, and outputting final system action selection; a user simulator used for carrying out natural language interaction with the dialogue management module and outputting user action selection; a natural language generation module used for receiving system action selection and user action selection, enabling the user to check the selection through generating sentences similar to a human language by using a template-based method. According to the invention, the medical knowledge information between diseases and symptoms is introduced as a guide, and the inquiry historical experience is enriched through continuous interaction witha simulated patient. The reasonability of inquiry symptoms and the accuracy of disease diagnosis are improved, and the diagnosis result is higher in credibility.
Owner:暗物智能科技(广州)有限公司

Unmanned aerial vehicle autonomous air combat decision framework and method

The invention discloses an unmanned aerial vehicle autonomous air combat decision framework and method, and belongs to the field of computer simulation. The framework comprises an air combat decisionmodule, a deep network learning module, an enhanced learning module and an air combat simulation environment which are based on domain knowledge. The air combat decision module generates an air combattraining data set and outputs the air combat training data set to the deep network learning module, and a depth network, a Q value fitting function and a motion selection function are obtained through learning and output to the enhanced learning module; the air combat simulation environment uses the learned air combat decision function to carry out a self-air combat process, and records air combat process data to form an enhanced learning training set; the enhanced learning module is used for optimizing and improving the Q value fitting function by utilizing the enhanced learning training set, and an air combat strategy with better performance is obtained. According to the framework, a Q function which is complex in nature can be more accurately and quickly fitted, the learning effect isimproved, the Q function is prevented from being converged to the local optimum value to the largest extent, an air combat decision optimization closed-loop process is constructed, and external intervention is not needed.
Owner:BEIHANG UNIV

Resource allocation method based on multi-agent reinforcement learning in mobile edge computing system

The invention discloses a resource allocation method based on multi-agent reinforcement learning in a mobile edge computing system, which comprises the following steps: (1) dividing a wireless channelinto a plurality of subcarriers, wherein each user can only select one subcarrier; (2) enabling each user to randomly select a channel and computing resources, and then calculating time delay and energy consumption generated by user unloading; (3) comparing the time delay energy consumption generated by the local calculation of the user with the time delay energy consumption unloaded to the edgecloud, and judging whether the unloading is successful or not; (4) obtaining a reward value of the current unloading action through multi-agent reinforcement learning, and calculating a value function; (5) enabling the user to perform action selection according to the strategy function; and (6) changing the learning rate of the user to update the strategy to obtain an optimal action set. Based onvariable-rate multi-agent reinforcement learning, computing resources and wireless resources of the mobile edge server are fully utilized, and the maximum value of the utility function of each intelligent terminal is obtained while the necessity of user unloading is considered.
Owner:SOUTHEAST UNIV

Intelligent vehicle speed decision-making method based on deep reinforcement learning and simulation method thereof

The invention discloses an intelligent vehicle speed decision-making method based on a deep reinforcement learning method. The method comprises the steps of constructing a state space S, an action space A and an instant rewarding space R of a Markov decision-making model of an intelligent vehicle passing through an intersection; initializing a neural network, and constructing an experience pool; performing action selection by adopting an epsilon-greed algorithm, and filling the experience into the experience pool constructed in the step 2; randomly selecting a part of experience from the experience pool, and training a neural network by adopting a stochastic gradient descent method; completing the speed decision of the intelligent vehicle at the current moment according to the latest neural network, adding the experience to an experience pool, randomly selecting a part of experience, and carrying out the training of a new round of neural network. The invention further discloses a simulation method of the intelligent vehicle speed decision-making method based on deep reinforcement learning. The method is advantaged in that simulation experiments are carried out based on a deep reinforcement learning simulation system established by a matlab automatic driving toolbox.
Owner:JILIN UNIV

Dialogue strategy-optimized cold start system and method

The invention relates to a dialogue strategy-optimized cold start system and a method. The system comprises a user input module, a dialogue state tracking module, a teacher decision-making module, a student decision-making module, an action selection module, an output module, a strategy training module and a reward function module. The action selection module randomly selects one final reply action from all reply actions generated by the teacher decision-making module and the student decision-making module. The output module converts the final reply action into a more natural expression and displays the more natural expression to a user. The strategy training module stores the dialogue experience (transition) in an empirical pool, samples a fixed number of experiences, and updates network parameters according to a depth Q network (DQN) algorithm. The reward function module calculates the reward of the dialogue at each round of the dialogue, and outputs the reward to the strategy training module. According to the invention, the performance of the dialogue strategy during the strengthened learning on-line training initial stage can be remarkably improved. The learning speed of the dialogue strategy is increased, and the number of dialogues used for achieving certain performances is reduced.
Owner:AISPEECH CO LTD

Apparatus and method for selecting multiple items in a graphical user interface

A computer readable medium includes executable instructions to identify an alternative selection mode within a graphical user interface. A set of selected items are linked during the alternative selection mode in response to single input action selection of each item.
Owner:BUSINESS OBJECTS SOFTWARE
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products