Towards Better Sample Efficiency In Multi-agent Reinforcement Learning Via Exploration
2025 · Amir Baghi, Jens Sjölund, Joakim Bergdahl, et al.
Abstract
Multi-agent reinforcement learning has shown promise in learning cooperative behaviors in team-based environments. However, such methods often demand extensive training time. For instance, the state-of-the-art method TiZero takes 40 days to train high-quality policies for a football environment. In this paper, we hypothesize that better exploration mechanisms can improve the sample efficiency of multi-agent methods. We propose two different approaches for better exploration in TiZero: a self-supervised intrinsic reward and a random network distillation bonus. Additionally, we introduce architectural modifications to the original algorithm to enhance TiZero's computational efficiency. We evaluate the sample efficiency of these approaches through extensive experiments. Our results show that random network distillation improves training sample efficiency by 18.8% compared to the original TiZero. Furthermore, we evaluate the qualitative behavior of the models produced by both variants agai
Authors
(none)
Tags
Stats
Related papers
- Strategically Efficient Exploration In Competitive Multi-agent Reinforcement Learning (2021)0.00
- Prioritized Guidance For Efficient Multi-agent Reinforcement Learning Exploration (2019)0.00
- Enhancing Sample Efficiency In Multi-agent RL With Uncertainty Quantification And Selective Exploration (2025)0.00
- Coordinated Exploration Via Intrinsic Rewards For Multi-agent Reinforcement Learning (2019)0.00
- Optimistic {\epsilon}-greedy Exploration For Cooperative Multi-agent Reinforcement Learning (2025)0.00
- Sample Efficient Reinforcement Learning Via Model-ensemble Exploration And Exploitation (2021)0.00
- Tizero: Mastering Multi-agent Football With Curriculum Learning And Self-play (2023)2.26
- Learning Off-policy With Model-based Intrinsic Motivation For Active Online Exploration (2024)0.00