Curriculum Learning With Counterfactual Group Relative Policy Advantage For Multi-agent Reinforcement Learning
2025 Β· Weiqiang Jin, Hongyang Du, Guizhong Liu, et al.
Abstract
Multi-agent reinforcement learning (MARL) has achieved strong performance in cooperative adversarial tasks. However, most existing methods typically train agents against fixed opponent strategies and rely on such meta-static difficulty conditions, which limits their adaptability to changing environments and often leads to suboptimal policies. Inspired by the success of curriculum learning (CL) in supervised tasks, we propose a dynamic CL framework for MARL that employs an self-adaptive difficulty adjustment mechanism. This mechanism continuously modulates opponent strength based on real-time agent training performance, allowing agents to progressively learn from easier to more challenging scenarios. However, the dynamic nature of CL introduces instability due to nonstationary environments and sparse global rewards. To address this challenge, we develop a Counterfactual Group Relative Policy Advantage (CGRPA), which is tightly coupled with the curriculum by providing intrinsic credit si
Authors
(none)
Tags
Stats
Related papers
- Towards Skilled Population Curriculum For Multi-agent Reinforcement Learning (2023)0.00
- Learning Progress Driven Multi-agent Curriculum (2022)0.00
- Robustness To Multi-modal Environment Uncertainty In MARL Using Curriculum Learning (2023)0.00
- Hypermarl: Adaptive Hypernetworks For Multi-agent RL (2024)0.00
- Stackelberg Games For Learning Emergent Behaviors During Competitive Autocurricula (2023)5.84
- Cooperative And Competitive Biases For Multi-agent Reinforcement Learning (2021)2.26
- Accelerate Multi-agent Reinforcement Learning In Zero-sum Games With Subgame Curriculum Learning (2023)0.00
- Cooperative Game-theoretic Credit Assignment For Multi-agent Policy Gradients Via The Core (2025)0.00