Asynchronous, Option-based Multi-agent Policy Gradient: A Conditional Reasoning Approach
2022 Β· Xubo Lyu, Amin Banitalebi-Dehkordi, Mo Chen, et al.
Abstract
Cooperative multi-agent problems often require coordination between agents, which can be achieved through a centralized policy that considers the global state. Multi-agent policy gradient (MAPG) methods are commonly used to learn such policies, but they are often limited to problems with low-level action spaces. In complex problems with large state and action spaces, it is advantageous to extend MAPG methods to use higher-level actions, also known as options, to improve the policy search efficiency. However, multi-robot option executions are often asynchronous, that is, agents may select and complete their options at different time steps. This makes it difficult for MAPG methods to derive a centralized policy and evaluate its gradient, as centralized policy always select new options at the same time. In this work, we propose a novel, conditional reasoning approach to address this problem and demonstrate its effectiveness on representative option-based multi-agent cooperative tasks thro
Authors
(none)
Tags
Stats
Related papers
- Optimistic Multi-agent Policy Gradient (2023)0.00
- TAPE: Leveraging Agent Topology For Cooperative Multi-agent Policy Gradient (2023)3.58
- Learning Explicit Credit Assignment For Cooperative Multi-agent Reinforcement Learning Via Polarization Policy Gradient (2022)4.52
- A Policy Gradient Algorithm For Learning To Learn In Multiagent Reinforcement Learning (2020)0.00
- Multi-agent Cooperation Through Learning-aware Policy Gradients (2024)0.00
- Counterfactual Multi-agent Policy Gradients (2017)0.00
- Descent-guided Policy Gradient For Scalable Cooperative Multi-agent Learning (2026)0.00
- Settling The Variance Of Multi-agent Policy Gradients (2021)0.00