Non-stationary Policy Learning For Multi-timescale Multi-agent Reinforcement Learning
2023 Β· Patrick Emami, Xiangyu Zhang, David Biagioni, et al.
Abstract
In multi-timescale multi-agent reinforcement learning (MARL), agents interact across different timescales. In general, policies for time-dependent behaviors, such as those induced by multiple timescales, are non-stationary. Learning non-stationary policies is challenging and typically requires sophisticated or inefficient algorithms. Motivated by the prevalence of this control problem in real-world complex systems, we introduce a simple framework for learning non-stationary policies for multi-timescale MARL. Our approach uses available information about agent timescales to define a periodic time encoding. In detail, we theoretically demonstrate that the effects of non-stationarity introduced by multiple timescales can be learned by a periodic multi-agent policy. To learn such policies, we propose a policy gradient algorithm that parameterizes the actor and critic with phase-functioned neural networks, which provide an inductive bias for periodicity. The framework's ability to effective
Authors
(none)
Tags
Stats
Related papers
- Dealing With Non-stationarity In Decentralized Cooperative Multi-agent Deep Reinforcement Learning Via Multi-timescale Learning (2023)0.00
- Hierarchical Deep Multiagent Reinforcement Learning With Temporal Abstraction (2018)0.00
- Multi-agent Reinforcement Learning In Stochastic Networked Systems (2020)0.00
- A Policy Gradient Algorithm For Learning To Learn In Multiagent Reinforcement Learning (2020)0.00
- Unsynchronized Decentralized Q-learning: Two Timescale Analysis By Persistence (2023)2.26
- Dealing With Non-stationarity In MARL Via Trust-region Decomposition (2021)0.00
- Scalable And Sample Efficient Distributed Policy Gradient Algorithms In Multi-agent Networked Systems (2022)0.00
- Global Convergence Of Localized Policy Iteration In Networked Multi-agent Reinforcement Learning (2022)2.26