Probing Dec-pomdp Reasoning In Cooperative MARL
2026 Β· Kale-Ab Tessera, Leonard Hinckeldey, Riccardo Zamboni, et al.
Abstract
Cooperative multi-agent reinforcement learning (MARL) is typically framed as a decentralised partially observable Markov decision process (Dec-POMDP), a setting whose hardness stems from two key challenges: partial observability and decentralised coordination. Genuinely solving such tasks requires Dec-POMDP reasoning, where agents use history to infer hidden states and coordinate based on local information. Yet it remains unclear whether popular benchmarks actually demand this reasoning or permit success via simpler strategies. We introduce a diagnostic suite combining statistically grounded performance comparisons and information-theoretic probes to audit the behavioural complexity of baseline policies (IPPO and MAPPO) across 37 scenarios spanning MPE, SMAX, Overcooked, Hanabi, and MaBrax. Our diagnostics reveal that success on these benchmarks rarely requires genuine Dec-POMDP reasoning. Reactive policies match the performance of memory-based agents in over half the scenarios, and em
Authors
(none)
Tags
Stats
Related papers
- Remembering The Markov Property In Cooperative MARL (2025)0.00
- Sample-efficient Reinforcement Learning Of Partially Observable Markov Games (2022)0.00
- Common Information Based Approximate State Representations In Multi-agent Reinforcement Learning (2021)0.00
- Macro-action-based Multi-agent/robot Deep Reinforcement Learning Under Partial Observability (2022)5.84
- Byzantine Robust Cooperative Multi-agent Reinforcement Learning As A Bayesian Game (2023)0.00
- Benchmarking Multi-agent Deep Reinforcement Learning Algorithms In Cooperative Tasks (2020)0.00
- Off-belief Learning (2021)0.00
- Revisiting Some Common Practices In Cooperative Multi-agent Reinforcement Learning (2022)0.00