Multi-agent Policy Optimization With Approximatively Synchronous Advantage Estimation
2020 Β· Lipeng Wan, Xuwei Song, Xuguang Lan, et al.
Abstract
Cooperative multi-agent tasks require agents to deduce their own contributions with shared global rewards, known as the challenge of credit assignment. General methods for policy based multi-agent reinforcement learning to solve the challenge introduce differentiate value functions or advantage functions for individual agents. In multi-agent system, polices of different agents need to be evaluated jointly. In order to update polices synchronously, such value functions or advantage functions also need synchronous evaluation. However, in current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously, thus suffer from natural estimation bias. In this work, we propose the approximatively synchronous advantage estimation. We first derive the marginal advantage function, an expansion from single-agent advantage function to multi-agent system. Further more, we introduce a policy approximation for synchronous advantage estimation, a
Authors
(none)
Tags
Stats
Related papers
- Provably Efficient Cooperative Multi-agent Reinforcement Learning With Function Approximation (2021)0.00
- Distributed Value Function Approximation For Collaborative Multi-agent Reinforcement Learning (2020)8.60
- Cooperative Multi-agent Policy Gradients With Sub-optimal Demonstration (2018)0.00
- Asynchronous Stochastic Approximations With Asymptotically Biased Errors And Deep Multi-agent Learning (2018)0.00
- Asynchronous, Option-based Multi-agent Policy Gradient: A Conditional Reasoning Approach (2022)0.00
- Fast Multi-agent Temporal-difference Learning Via Homotopy Stochastic Primal-dual Optimization (2019)0.00
- Local Optimization Achieves Global Optimality In Multi-agent Reinforcement Learning (2023)0.00
- Direct Advantage Estimation (2021)0.00