Mitigating Relative Over-generalization In Multi-agent Reinforcement Learning
2024 Β· Ting Zhu, Yue Jin, Jeremie Houssineau, et al.
Abstract
In decentralized multi-agent reinforcement learning, agents learning in isolation can lead to relative over-generalization (RO), where optimal joint actions are undervalued in favor of suboptimal ones. This hinders effective coordination in cooperative tasks, as agents tend to choose actions that are individually rational but collectively suboptimal. To address this issue, we introduce MaxMax Q-Learning (MMQ), which employs an iterative process of sampling and evaluating potential next states, selecting those with maximal Q-values for learning. This approach refines approximations of ideal state transitions, aligning more closely with the optimal joint policy of collaborating agents. We provide theoretical analysis supporting MMQ's potential and present empirical evaluations across various environments susceptible to RO. Our results demonstrate that MMQ frequently outperforms existing baselines, exhibiting enhanced convergence and sample efficiency.
Authors
(none)
Tags
Stats
Related papers
- MA2QL: A Minimalist Approach To Fully Decentralized Multi-agent Reinforcement Learning (2022)0.00
- Regularize! Don't Mix: Multi-agent Reinforcement Learning Without Explicit Centralized Structures (2021)0.00
- Qatten: A General Framework For Cooperative Multiagent Reinforcement Learning (2020)0.00
- CURO: Curriculum Learning For Relative Overgeneralization (2022)0.00
- Multi-agent Advisor Q-learning (2021)0.00
- Exploiting Inter-agent Coupling Information For Efficient Reinforcement Learning Of Cooperative LQR (2025)0.00
- Reducing Overestimation Bias In Multi-agent Domains Using Double Centralized Critics (2019)0.00
- Maximum Entropy Heterogeneous-agent Reinforcement Learning (2023)0.00