Reinforcement learning battle game AI training method based on information bottleneck theory

A technology of reinforcement learning and information bottleneck, which is applied in the field of game intelligent AI learning, can solve problems such as single routine, reduce mutual information, and lack of flexibility in fighting between players, so as to save training time, speed up training, and improve sampling efficiency effect

Active Publication Date: 2021-04-30
SHANGHAI JIAO TONG UNIV
View PDF8 Cites 2 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In recent years, with the development of deep learning technology, many achievements have been made in the field of deep reinforcement learning. More and more methods combining deep learning and reinforcement learning algorithms (such as DQN, A2C, PPO, DDPG, etc.) are used in video game AI. However, in many cases, in reinforcement learning problems, the interaction cost between the agent and the environment is very high, so it is hoped that the algorithm can converge as quickly as possible to save training costs, that is, through the same sampling rate, learn a higher level of intelligent strategy
[0003] In the existing battle games, the man-machine duel mode is one of the important parts of the game. The existing game AI is designed by artificially setting the strategy distribution and targeted action mapping, so the routine is often single and different. It has the flexibility of fighting between players. At the same time, in the existing method of reinforcement learning training game AI, using original pixels as input will have a lot of redundant information that will affect the efficiency of network learning and the speed of reinforcement learning algorithms.
In the deep learning experiment, it is shown that during the training process, the neural network first remembers the input through the mutual information of the input layer and the representation layer variables, and then compresses the input information according to the specific learning task to discard the useless redundancy. Remaining information, that is, reducing the mutual information between the input layer and the presentation layer, this process is the Information E-C Process, and the existing reinforcement learning algorithm has not yet optimized this information extraction process

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Reinforcement learning battle game AI training method based on information bottleneck theory
  • Reinforcement learning battle game AI training method based on information bottleneck theory
  • Reinforcement learning battle game AI training method based on information bottleneck theory

Examples

Experimental program
Comparison scheme
Effect test

Embodiment

[0051] Such as figure 1 As shown, the present invention provides a kind of reinforcement learning battle game AI training method based on information bottleneck theory, comprises the following steps:

[0052] 1) Initialize the network parameters and hyperparameters of the AI ​​training model (the CNN model is used in this example, and the specific model structure is as follows image 3 shown), set the learning rate and the number of samples sampled from the parameter distribution;

[0053] 2) Make decision-making interactions in the simulation environment through AI to obtain sample training batch data sets;

[0054] 3) Based on the sample training batch data set obtained from the interaction between AI and the environment, the reinforcement learning algorithm (A2C algorithm is used in this example) is used to iteratively train the AI ​​training model, and the model parameters are saved in stages;

[0055] 4) Fix some parameters of the saved models at different stages, and u...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention relates to a reinforcement learning battle game AI training method based on an information bottleneck theory. The method comprises the following steps of 1) initializing an AI training model; 2) performing decision interaction in a simulation environment through the game AI to obtain a sample training batch data set; 3) iteratively training an AI training model by adopting a reinforcement learning algorithm according to a sample training batch data set obtained by interaction between the game AI and the environment, and storing parameters of the AI training model in stages; and (4) fixing part of the stored parameters of the AI training models at different stages, training the remaining parameters again by using a reinforcement learning algorithm to perform fine adjustment to obtain the final AI training models of different levels of AIs, and generating an AI file of the battle game. Compared with the prior art, the method has advantages of high sampling efficiency, high training speed, high test flexibility, AI grading and the like.

Description

technical field [0001] The invention relates to the field of game intelligence AI learning, in particular to an AI training method based on information bottleneck theory for reinforcement learning and fighting games. Background technique [0002] In recent years, with the development of deep learning technology, many achievements have been made in the field of deep reinforcement learning. More and more methods combining deep learning and reinforcement learning algorithms (such as DQN, A2C, PPO, DDPG, etc.) are used in video game AI. However, in many cases, in reinforcement learning problems, the interaction cost between the agent and the environment is very high, so it is hoped that the algorithm can converge as quickly as possible to save training costs, that is, through the same sampling rate and learn a higher level of intelligent strategy. [0003] In the existing battle games, the man-machine duel mode is one of the important parts of the game. The existing game AI is ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): A63F13/67G06N3/08
CPCA63F13/67G06N3/08
Inventor 张轶飞程帆张冬梅
Owner SHANGHAI JIAO TONG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products