Remembering The Markov Property In Cooperative MARL
2025 Β· Kale-Ab Abebe Tessera, Leonard Hinckeldey, Riccardo Zamboni, et al.
Abstract
Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents' behaviour. In practice, current model-free MARL algorithms use simple recurrent function approximators to address the challenge of reasoning about others using partial information. In this position paper, we argue that the empirical success of these methods is not due to effective Markov signal recovery, but rather to learning simple conventions that bypass environment observations and memory. Through a targeted case study, we show that co-adapting agents can learn brittle conventions, which then fail when partnered with non-adaptive agents. Crucially, the same models can learn grounded policies when the task design necessitates it, revealing that the issue is not a fundamental limitation of the learning models but a failure of the benchmark design. Our analysis also su
Authors
(none)
Tags
Stats
Related papers
- Probing Dec-pomdp Reasoning In Cooperative MARL (2026)0.00
- Sample-efficient Reinforcement Learning Of Partially Observable Markov Games (2022)0.00
- Cooperative Multi-agent RL With Communication Constraints (2026)0.00
- Information State Embedding In Partially Observable Cooperative Multi-agent Reinforcement Learning (2020)0.00
- Byzantine Robust Cooperative Multi-agent Reinforcement Learning As A Bayesian Game (2023)0.00
- Common Information Based Approximate State Representations In Multi-agent Reinforcement Learning (2021)0.00
- Learning To Model Opponent Learning (2020)0.00
- Macro-action-based Multi-agent/robot Deep Reinforcement Learning Under Partial Observability (2022)5.84