How Exploration Breaks Cooperation In Shared-policy Multi-agent Reinforcement Learning
2026 Β· Yi-Ning Weng, Hsuan-Wei Lee
Abstract
Multi-agent reinforcement learning in dynamic social dilemmas commonly relies on parameter sharing to enable scalability. We show that in shared-policy Deep Q-Network learning, standard exploration can induce a robust and systematic collapse of cooperation even in environments where fully cooperative equilibria are stable and payoff dominant. Through controlled experiments, we demonstrate that shared DQN converges to stable but persistently low-cooperation regimes. This collapse is not caused by reward misalignment, noise, or insufficient training, but by a representational failure arising from partial observability combined with parameter coupling across heterogeneous agent states. Exploration-driven updates bias the shared representation toward locally dominant defection responses, which then propagate across agents and suppress cooperative learning. We confirm that the failure persists across network sizes, exploration schedules, and payoff structures, and disappears when parameter
Authors
(none)
Tags
Stats
Related papers
- Parameter Sharing Deep Deterministic Policy Gradient For Cooperative Multi-agent Reinforcement Learning (2017)0.00
- Graph Exploration For Effective Multi-agent Q-learning (2023)5.24
- A New Framework For Multi-agent Reinforcement Learning -- Centralized Training And Exploration With Decentralized Execution Via Policy Distillation (2019)0.00
- Exploration-exploitation In Multi-agent Competition: Convergence With Bounded Rationality (2021)0.00
- Revisiting Parameter Sharing In Multi-agent Deep Reinforcement Learning (2020)0.00
- Centralized Model And Exploration Policy For Multi-agent RL (2021)0.00
- Resolving Implicit Coordination In Multi-agent Deep Reinforcement Learning With Deep Q-networks & Game Theory (2020)0.00
- Improved Cooperation By Balancing Exploration And Exploitation In Intertemporal Social Dilemma Tasks (2021)0.00