Scalable Reinforcement Learning Policies For Multi-agent Control
2020 Β· Christopher D. Hsu, Heejin Jeong, George J. Pappas, et al.
Abstract
We develop a Multi-Agent Reinforcement Learning (MARL) method to learn scalable control policies for target tracking. Our method can handle an arbitrary number of pursuers and targets; we show results for tasks consisting up to 1000 pursuers tracking 1000 targets. We use a decentralized, partially-observable Markov Decision Process framework to model pursuers as agents receiving partial observations (range and bearing) about targets which move using fixed, unknown policies. An attention mechanism is used to parameterize the value function of the agents; this mechanism allows us to handle an arbitrary number of targets. Entropy-regularized off-policy RL methods are used to train a stochastic policy, and we discuss how it enables a hedging behavior between pursuers that leads to a weak form of cooperation in spite of completely decentralized control execution. We further develop a masking heuristic that allows training on smaller problems with few pursuers-targets and execution on much l
Authors
(none)
Tags
Stats
Related papers
- Multi-agent Reinforcement Learning In Stochastic Networked Systems (2020)0.00
- Adversarial Search And Tracking With Multiagent Reinforcement Learning In Sparsely Observable Environment (2023)4.52
- Hierarchical Policy-gradient Reinforcement Learning For Multi-agent Shepherding Control Of Non-cohesive Targets (2025)0.00
- Scalable Multi-agent Reinforcement Learning For Networked Systems With Average Reward (2020)0.00
- Scalable And Sample Efficient Distributed Policy Gradient Algorithms In Multi-agent Networked Systems (2022)0.00
- Heterogeneous Multi-agent Reinforcement Learning For Zero-shot Scalable Collaboration (2024)6.34
- Faster Last-iterate Convergence Of Policy Optimization In Zero-sum Markov Games (2022)0.00
- Multi-agent Trust Region Policy Optimization (2020)12.61