On The Fundamental Limitations Of Decentralized Learnable Reward Shaping In Cooperative Multi-agent Reinforcement Learning
2025 Β· Aditya Akella
Abstract
Recent advances in learnable reward shaping have shown promise in single-agent reinforcement learning by automatically discovering effective feedback signals. However, the effectiveness of decentralized learnable reward shaping in cooperative multi-agent settings remains poorly understood. We propose DMARL-RSA, a fully decentralized system where each agent learns individual reward shaping, and evaluate it on cooperative navigation tasks in the simple_spread_v3 environment. Despite sophisticated reward learning, DMARL-RSA achieves only -24.20 +/- 0.09 average reward, compared to MAPPO with centralized training at 1.92 +/- 0.87 -- a 26.12-point gap. DMARL-RSA performs similarly to simple independent learning (IPPO: -23.19 +/- 0.96), indicating that advanced reward shaping cannot overcome fundamental decentralized coordination limitations. Interestingly, decentralized methods achieve higher landmark coverage (0.888 +/- 0.029 for DMARL-RSA, 0.960 +/- 0.045 for IPPO out of 3 total) but wors
Authors
(none)
Tags
Stats
Related papers
- Fully Decentralized Cooperative Multi-agent Reinforcement Learning: A Survey (2024)0.00
- Learning To Shape Rewards Using A Game Of Two Partners (2021)0.00
- Mean-field Multi-agent Reinforcement Learning: A Decentralized Network Approach (2021)0.00
- Centralized Reward Agent For Knowledge Sharing And Transfer In Multi-task Reinforcement Learning (2024)0.00
- From Centralized To Self-supervised: Pursuing Realistic Multi-agent Reinforcement Learning (2023)0.00
- Contextual Knowledge Sharing In Multi-agent Reinforcement Learning With Decentralized Communication And Coordination (2025)0.00
- Locality Matters: A Scalable Value Decomposition Approach For Cooperative Multi-agent Reinforcement Learning (2021)0.00
- Revisiting Some Common Practices In Cooperative Multi-agent Reinforcement Learning (2022)0.00