According to the multi-aircraft cooperative
air combat planning method and
system based on deep
reinforcement learning provided by the invention, a combat aircraft is regarded as an
intelligent agent, a
reinforcement learning agent model is constructed, and a
network model is trained through a centralized training-distributed execution architecture, so that the defect that the exploratory performance of a
network model is not strong due to low action distinction degree among different entities during multi-aircraft cooperation is overcome; and by embedding expert experience in the
reward value, the problem that a large amount of expert experience support is needed in the prior art is solved. Through an
experience sharing mechanism, all agents share one set of network parameters and experience playback
library, and the problem that the strategy of a single
intelligent agent is not only dependent on the feedback of the own strategy and the environment, but also influenced by the behaviors and cooperation relationships of other agents is solved. By increasing the sampling probability of the samples with large absolute values of the
advantage values, the samples with extremely large or extremely small reward values can influence the training of the neural network, and the convergence speed of the
algorithm is accelerated. The exploration capability of the
intelligent agent is improved by adding the strategy entropy.