The invention discloses a
power grid optimal carbon energy composite flow obtaining method based on
swarm intelligence reinforcement learning. The method comprises the following steps: S1, establishing a multi-object optimal carbon energy composite flow model
object function, S2, setting a reward function according to the
object function, S3, updating a Q value matrix of each main body according to an eligibility trace, S4, calculating a greed action of each main body, S5, updating an action probability matrix of each main body, S6, randomly selecting a pre-judgment action of each main body at a current state, S7, inputting the multiple main bodies in a coordinative manner, and solving an optimal action of a swarm, S8, performing updating and then obtaining action values after correction, S9, determining a
control variable matrix, and performing load flow calculation, and S10, after the load flow calculation, judging whether the Q value matrix is convergent, taking a result obtained by last load flow calculation as a
power grid optimal carbon energy composite flow if the Q value matrix is convergent, and returning to the S2 if the Q value matrix is not convergent. The method enables loss of an energy flow and loss of a carbon
discharge flow in a
power grid to reach minimums; and the good
global optimization capability is guaranteed, and the convergence speed of an
algorithm is obviously improve at the same time.