Macro-action-based Deep Multi-agent Reinforcement Learning
2020 Β· Yuchen Xiao, Joshua Hoffman, Christopher Amato
Abstract
In real-world multi-robot systems, performing high-quality, collaborative behaviors requires robots to asynchronously reason about high-level action selection at varying time durations. Macro-Action Decentralized Partially Observable Markov Decision Processes (MacDec-POMDPs) provide a general framework for asynchronous decision making under uncertainty in fully cooperative multi-agent tasks. However, multi-agent deep reinforcement learning methods have only been developed for (synchronous) primitive-action problems. This paper proposes two Deep Q-Network (DQN) based methods for learning decentralized and centralized macro-action-value functions with novel macro-action trajectory replay buffers introduced for each case. Evaluations on benchmark problems and a larger domain demonstrate the advantage of learning with macro-actions over primitive-actions and the scalability of our approaches.
Authors
(none)
Tags
Stats
Related papers
- Macro-action-based Multi-agent/robot Deep Reinforcement Learning Under Partial Observability (2022)5.84
- Deep Multi-agent Reinforcement Learning With Discrete-continuous Hybrid Action Spaces (2019)12.47
- Reusability And Transferability Of Macro Actions For Reinforcement Learning (2019)0.00
- A Further Exploration Of Deep Multi-agent Reinforcement Learning With Hybrid Action Space (2022)5.84
- MACRPO: Multi-agent Cooperative Recurrent Policy Optimization (2021)0.00
- Hierarchical Meta-reinforcement Learning Via Automated Macro-action Discovery (2024)0.00
- A Compression-inspired Framework For Macro Discovery (2017)0.00
- Centralized Model And Exploration Policy For Multi-agent RL (2021)0.00