Decomposing Communication Gain And Delay Cost Under Cross-timestep Delays In Cooperative Multi-agent Reinforcement Learning

Abstract

Communication is essential for coordination in *cooperative* multi-agent reinforcement learning under partial observability, yet *cross-timestep* delays cause messages to arrive multiple timesteps after generation, inducing temporal misalignment and making information stale when consumed. We formalize this setting as a delayed-communication partially observable Markov game (DeComm-POMG) and decompose a message's effect into *communication gain* and *delay cost*, yielding the Communication Gain and Delay Cost (CGDC) metric. We further establish a value-loss bound showing that the degradation induced by delayed messages is upper-bounded by a discounted accumulation of an information gap between the action distributions induced by timely versus delayed messages. Guided by CGDC, we propose \textbf\{CDCMA\}, an actor--critic framework that requests messages only when predicted CGDC is positive, predicts future observations to reduce misalignment at consumption, and fuses delayed messa

Decomposing Communication Gain And Delay Cost Under Cross-timestep Delays In Cooperative Multi-agent Reinforcement Learning

Abstract

Authors

Tags

Stats

Related papers