Abstract
Non-cooperative multi-agent reinforcement learning (MARL) faces significant challenges due to non-stationarity and non-unique learning goals. While equilibrium-based analysis frameworks effectively address these challenges, existing approaches suffer from high computational complexity as the number of agents increases. To overcome this limitation, we propose a population game-based Q-learning (Pop-Q) algorithm that computes Nash equilibrium (NE) policies through efficient population dynamics. Our approach represents population evolution using ordinary differential equations (ODEs) and introduces two key mechanisms to reduce the complexity of solving these ODEs. By adjusting the number of iterations in population dynamics, our algorithm enables a controllable trade-off between computational complexity and equilibrium accuracy. Experimental results demonstrate that Pop-Q achieves competitive performance in two-agent settings and superior performance in three-agent environments compared to existing equilibrium-based MARL algorithms. The proposed algorithm has significant potential applications in modern systems requiring decentralized coordination, including intelligent traffic systems, warehouse automation, and autonomous aerial vehicle (AAV) swarm-aided communication networks.