Sample And Communication Efficient Fully Decentralized MARL Policy Evaluation Via A New Approach: Local TD Update

Abstract

In actor-critic framework for fully decentralized multi-agent reinforcement learning (MARL), one of the key components is the MARL policy evaluation (PE) problem, where a set of \(N\) agents work cooperatively to evaluate the value function of the global states for a given policy through communicating with their neighbors. In MARL-PE, a critical challenge is how to lower the sample and communication complexities, which are defined as the number of training samples and communication rounds needed to converge to some \(\epsilon\)-stationary point. To lower communication complexity in MARL-PE, a "natural'' idea is to perform multiple local TD-update steps between each consecutive rounds of communication to reduce the communication frequency. However, the validity of the local TD-update approach remains unclear due to the potential "agent-drift'' phenomenon resulting from heterogeneous rewards across agents in general. This leads to an interesting open question: Can the local TD-update app

Sample And Communication Efficient Fully Decentralized MARL Policy Evaluation Via A New Approach: Local TD Update

Abstract

Authors

Tags

Stats

Related papers