Sample And Communication Efficient Fully Decentralized MARL Policy Evaluation Via A New Approach: Local TD Update
2024 Β· Fnu Hairi, Zifan Zhang, Jia Liu
Abstract
In actor-critic framework for fully decentralized multi-agent reinforcement learning (MARL), one of the key components is the MARL policy evaluation (PE) problem, where a set of \(N\) agents work cooperatively to evaluate the value function of the global states for a given policy through communicating with their neighbors. In MARL-PE, a critical challenge is how to lower the sample and communication complexities, which are defined as the number of training samples and communication rounds needed to converge to some \(\epsilon\)-stationary point. To lower communication complexity in MARL-PE, a "natural'' idea is to perform multiple local TD-update steps between each consecutive rounds of communication to reduce the communication frequency. However, the validity of the local TD-update approach remains unclear due to the potential "agent-drift'' phenomenon resulting from heterogeneous rewards across agents in general. This leads to an interesting open question: Can the local TD-update app
Authors
(none)
Tags
Stats
Related papers
- Taming Multi-agent Reinforcement Learning With Estimator Variance Reduction (2022)0.00
- Finite-sample Analysis Of Decentralized Temporal-difference Learning With Linear Function Approximation (2019)0.00
- Provably Efficient Multi-agent Reinforcement Learning With Fully Decentralized Communication (2021)0.00
- Towards Global Optimality In Cooperative MARL With The Transformation And Distillation Framework (2022)0.00
- Cooperative Multi-agent RL With Communication Constraints (2026)0.00
- Fully Decentralized Multi-agent Reinforcement Learning With Networked Agents (2018)0.00
- Multi-agent Off-policy TD Learning: Finite-time Analysis With Near-optimal Sample Complexity And Communication Complexity (2021)0.00
- On Improving Model-free Algorithms For Decentralized Multi-agent Reinforcement Learning (2021)0.00