Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

53 results about "Strategic learning" patented technology

Dialog strategy online realization method based on multi-task learning

The invention discloses a dialog strategy online realization method based on multi-task learning. According to the method, corpus information of a man-machine dialog is acquired in real time, current user state features and user action features are extracted, and construction is performed to obtain training input; then a single accumulated reward value in a dialog strategy learning process is split into a dialog round number reward value and a dialog success reward value to serve as training annotations, and two different value models are optimized at the same time through the multi-task learning technology in an online training process; and finally the two reward values are merged, and a dialog strategy is updated. Through the method, a learning reinforcement framework is adopted, dialog strategy optimization is performed through online learning, it is not needed to manually design rules and strategies according to domains, and the method can adapt to domain information structures with different degrees of complexity and data of different scales; and an original optimal single accumulated reward value task is split, simultaneous optimization is performed by use of multi-task learning, therefore, a better network structure is learned, and the variance in the training process is lowered.
Owner:AISPEECH CO LTD

Efficient dialogue policy learning

Efficient exploration of natural language conversations associated with dialogue policy learning may be performed using probabilistic distributions. Exploration may comprise identifying key terms associated with the received natural language input utilizing the structured representation. Identifying key terms may include converting raw text of the received natural language input into a structured representation. Exploration may also comprise mapping at least one of the key terms to an action to be performed by the computer system in response to receiving natural language input associated with the at least one key term. Mapping may then be performed using a probabilistic distribution. The action may then be performed by the computer system. A replay buffer may also be utilized by the computer system to track what has occurred in previous conversations. The replay buffer may then be pre-filled with one or more successful dialogues to jumpstart exploration.
Owner:MICROSOFT TECH LICENSING LLC

Intelligent agent control method and device based on reinforcement learning

The invention relates to an intelligent agent control method and device based on reinforcement learning. The method comprises the steps of obtaining current local observation of an intelligent agent;taking the current local observation of the intelligent agent as the input of a reinforcement learning model, and obtaining the current execution action of the intelligent agent output by the reinforcement learning model; controlling an intelligent agent to execute the current execution action of the intelligent agent; according to the technical scheme provided by the invention, the strategy learning process in a large-scale multi-intelligent system can be effectively simplified, the number and types of intelligent agents are easy to expand, and the method has potential value in large-scale real world application.
Owner:天津(滨海)人工智能军民融合创新中心

Multi-agent action strategy learning method and device, medium and computing equipment

The embodiment of the invention provides a multi-agent action strategy learning method. The multi-agent action strategy learning method comprises the steps that multiple agents sample corresponding actions according to respective initial action strategies; respectively estimating the advantages obtained after the multiple agents execute the corresponding actions; and updating the action strategy of each intelligent agent based on the advantages obtained after the multiple intelligent agents execute the corresponding actions, so that each updated action strategy can enable the corresponding intelligent agent to obtain higher return. The method provided by the invention is applied to a task processing-oriented machine learning scene; meanwhile, a plurality of cooperative intelligent agents are trained (namely a plurality of action strategies are trained at the same time). A pre-built simulator and the intelligent agents are not adopted for interaction, manual supervision is not needed, time cost and resources are greatly saved, in addition, in order to enable all the intelligent agents to learn excellent action strategies, different awards are distributed to all the intelligent agents, and therefore the multiple intelligent agents can learn the more excellent action strategies.
Owner:TSINGHUA UNIV

Multi-vehicle collaborative planning method based on distributed crowd-sourcing learning

The invention discloses a multi-vehicle collaborative planning method based on crowd-sourcing learning, and belongs to the technical field of multi-vehicle-road collaborative decision making. According to the invention, the edge server is utilized to reduce the requirements of the computing capability and the communication capability of the vehicle; the evolutionary game is used for modeling the process of continuous game between vehicles in routing planning, and when the game state forms a stable situation, each vehicle obtains a routing decision with maximum own benefit; an intersection passing driving decision-making module is deployed on each vehicle, the vehicle is regarded as an independent decision-making individual, and a cooperative driving behavior of multiple vehicles at the intersection is modeled by using the powerful strategy learning capability of deep reinforcement learning; a traffic situation prediction module is calculated and deployed at the roadside edge, and the traffic situation perception under the limited visual field of vehicles is expanded by using the communication capability of multiple vehicles and roads. According to the invention, different aspects of road resources are optimized, space-time utilization of the intersection is optimized, space-time utilization of road resources around the intersection is optimized, and throughput of the intersection is increased.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Student learning emotion recognition method based on convolutional neural network

PendingCN113657168ASolve the problem of difficult emotion perceptionSolve the problem that is not easy to perceiveCharacter and pattern recognitionNeural architecturesOnline learningBiology
The invention discloses a student learning emotion recognition method based on a convolutional neural network. The method comprises the following steps: classifying the expressions of a student through a convolutional neural network model, dividing the learning emotion of the student into positive emotion and negative emotion according to the expressions, storing the student information and emotion information, and feeding back the learning emotion of the student to a teacher. Parents and students can solve the problem that emotions of the students are not easy to perceive, support is provided for teachers to optimize classroom settings and pay attention to learning emotions of the students, a positive role is played in guaranteeing the classroom effect, support can be provided for online learning and detection of student input degree, and an educator can adjust teaching strategies and a learner can adjust learning states.
Owner:XIAN UNIV OF TECH

Information physical system safety control method based on deep reinforcement learning

The invention discloses an information physical system security control method based on deep reinforcement learning, and belongs to the technical field of information security. According to the invention, the problem of poor control performance of a security control strategy designed based on an existing method under the condition of network attack is solved. According to the method, the dynamic equation of the cyber-physical system under the attacked condition is described as a Markov decision process, and based on the established Markov process, the security control problem of the cyber-physical system under the false data injection attack condition is converted into a control strategy learning problem only using data; based on a flexible action-critic reinforcement learning algorithm framework, a flexible action-critic reinforcement learning algorithm based on a Lyapunov function is proposed, a novel deep neural network training framework is provided, a Lyapunov stability theory is fused in the design process, the stability of an information physical system is ensured, and the control performance is effectively improved. The method can be applied to safety control of the information physical system.
Owner:HARBIN INST OF TECH

Object access strategy configuration method and device

The invention relates to an object access strategy configuration method and device. The method comprises the following steps: sending a strategy acquisition instruction to the first terminal, whereinthe strategy obtaining instruction is used for indicating the first terminal to obtain a target control strategy from the trusted management server, wherein the target control strategy is used for indicating a control operation performed on a behavior of accessing the target object, the target control strategy is generated through a first strategy learning process executed on the second terminal,and the first strategy learning process is a process of learning a first access log of the target object on the second terminal; under the condition that a strategy obtaining request sent by the firstterminal is received, sending a target control strategy to the first terminal in response to the strategy obtaining request; and receiving strategy effective information sent by the first terminal, the strategy effective information being used for indicating that the target control strategy is confirmed to be effective on the first terminal. According to the application, the technical problem that the configuration efficiency of the object access strategy is relatively low in related technologies is solved.
Owner:BEIJING KEXIN HUATAI INFORMATION TECH

Method and System for Modelling and Control Partially Measurable Systems

PendingUS20220179419A1High data efficiencySubstantial data efficiency in solvingMathematical modelsKernel methodsControl systemModelSim
A controller for controlling a system that includes a policy configured to control the system is provided. The controller includes an interface connected to the system, the interface being configured to acquire an action state and a measurement state via sensors measuring the system, a memory to store computer-executable program modules including a model learning module and a policy learning module, a processor configured to perform steps of the program modules. The steps include offline-modeling to generate offline-learning states based on the action state and measurement state using the model learning program, providing the offline states to the policy learning program to generate policy parameters, and updating the policy of the system to operate the system based on the policy parameters. In the policy learning program to generate the policy parameters are considered a dropout method to improve the optimization of the policy parameters, a particle method to compute and evaluate the evolution of the particle states and a model of the sensor and a model of the online estimator to generate particle state online estimates to approximate the state estimates based on the particles states generated from the model learning program.
Owner:MITSUBISHI ELECTRIC RES LAB INC

Multi-task network model training method and system based on adaptive task weight

The invention relates to a multi-task network model training method and system based on adaptive task weights. According to the method, a sharing mode is learned through a strategy specific to tasks, the strategy autonomously selects which layers are executed in a multi-task network, and weights matched with the tasks can be searched at the same time, so that the model is better trained. According to the method, the multi-task network model is reconstructed based on ResNet, the learning strategy is effectively optimized according to the image in the data set in the training process, and the oneness of the multi-task model is overcome while the task index is improved. According to the method, a multi-task loss function suitable for regression and classification tasks is deduced based on probability theory maximum likelihood estimation, the task weight can be automatically adjusted in the training process so as to better improve the model performance, and the problem that the task weight is not flexible is solved.
Owner:HANGZHOU DIANZI UNIV

Ceph system performance optimization strategy and system based on deep reinforcement learning

The invention discloses a Ceph system performance optimization system based on deep reinforcement learning. The Ceph system performance optimization system is composed of a data source module, a dataaccess mode learning module, an evaluation mechanism learning module and a system parameter adjustment learning module. The Ceph system performance optimization strategy based on deep reinforcement learning is realized through the following steps: S1, preprocessing a data source; s2, learning and classifying a Ceph file system running environment model; s3, carrying out evaluation mechanism learning; and S4, learning a Ceph file system parameter adjustment strategy. According to the data access method, deep reinforcement learning algorithm and interactive learning of an A2C model and a Ceph file system are combined to obtain the optimized parameters, and optimal system parameters adapted to data access mode may be selected; the method can adapt to different data access modes and hardware configurations, the optimal system parameters are obtained through intelligent learning, the system parameters can be obtained according to the optimal system parameters, and therefore the performanceof the Ceph file system is improved.
Owner:STATE GRID ANHUI ELECTRIC POWER +1

Task-oriented dialogue strategy generation method

The invention relates to a task-oriented dialogue strategy generation method, and the method comprises the following steps: establishing a dialogue state tracker, and determining a dialogue state space, an action space and formalized representation thereof; simulating a dialogue state by using a variational automatic encoder; simulating a dialogue action by using a multi-layer perceptron and Gumbel Softmax; performing adversarial training on a simulation sample generator and a discriminator; and finally training a dialogue strategy by using a reinforcement learning method. Firstly, a simulation sample generator is used for learning a reward function, and loss from a discriminator can be directly fed back to the generator for optimization; secondly, the trained discriminator is taken as a dialogue reward to be brought into a reinforcement learning process for guiding dialogue strategy learning; the dialogue strategy can be updated by utilizing any reinforcement learning algorithm; according to the method, common information contained in high-quality dialogues generated by human beings can be deduced by distinguishing the dialogues generated by the human beings and the machine respectively, and then the learned information is fully utilized to guide dialogue strategy learning in a new field in a transfer learning mode.
Owner:网经科技(苏州)有限公司

Large-scale unmanned aerial vehicle cluster flight method based on deep reinforcement learning

The invention discloses a large-scale unmanned aerial vehicle cluster flight method based on deep reinforcement learning, and the method comprises the steps: dividing a learning process of an unmanned aerial vehicle cluster anti-collision strategy into a plurality of courses in sequence, and enabling the unmanned aerial vehicle cluster scale of the next course to be larger than the unmanned aerial vehicle cluster scale of the previous course; constructing a curriculum reinforcement learning framework based on an actuator network and an evaluator network, and setting a group constant network based on an attention mechanism in the curriculum reinforcement learning framework; sequentially carrying out strategy learning on each course according to the course reinforcement learning framework to obtain a flight strategy of each unmanned aerial vehicle; and according to the empirical data of each unmanned aerial vehicle in the previous course of the current course, the executor network parameters and the evaluator network parameters of the current course in the strategy learning process are updated. According to the invention, the learning and training efficiency of the large-scale unmanned aerial vehicle can be effectively improved, the collision of the large-scale unmanned aerial vehicle cluster during flight is effectively avoided, and the generalization ability is strong.
Owner:NAT UNIV OF DEFENSE TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products