Macro-action-based Multi-agent/robot Deep Reinforcement Learning Under Partial Observability
2022 Β· Yuchen Xiao
Abstract
The state-of-the-art multi-agent reinforcement learning (MARL) methods have provided promising solutions to a variety of complex problems. Yet, these methods all assume that agents perform synchronized primitive-action executions so that they are not genuinely scalable to long-horizon real-world multi-agent/robot tasks that inherently require agents/robots to asynchronously reason about high-level action selection at varying time durations. The Macro-Action Decentralized Partially Observable Markov Decision Process (MacDec-POMDP) is a general formalization for asynchronous decision-making under uncertainty in fully cooperative multi-agent tasks. In this thesis, we first propose a group of value-based RL approaches for MacDec-POMDPs, where agents are allowed to perform asynchronous learning and decision-making with macro-action-value functions in three paradigms: decentralized learning and control, centralized learning and control, and centralized training for decentralized execution (C
Authors
(none)
Tags
Stats
Related papers
- Macro-action-based Deep Multi-agent Reinforcement Learning (2020)0.00
- Deep Decentralized Multi-task Multi-agent Reinforcement Learning Under Partial Observability (2017)0.00
- Optimal Decision-making In Mixed-agent Partially Observable Stochastic Environments Via Reinforcement Learning (2019)0.00
- Centralized Model And Exploration Policy For Multi-agent RL (2021)0.00
- Sample-efficient Reinforcement Learning Of Partially Observable Markov Games (2022)0.00
- MACRPO: Multi-agent Cooperative Recurrent Policy Optimization (2021)0.00
- Networked Agents In The Dark: Team Value Learning Under Partial Observability (2025)0.00
- Remembering The Markov Property In Cooperative MARL (2025)0.00