Neural Replicator Dynamics
2019 Β· Daniel Hennes, Dustin Morrill, Shayegan Omidshafiei, et al.
Abstract
Policy gradient and actor-critic algorithms form the basis of many commonly used training techniques in deep reinforcement learning. Using these algorithms in multiagent environments poses problems such as nonstationarity and instability. In this paper, we first demonstrate that standard softmax-based policy gradient can be prone to poor performance in the presence of even the most benign nonstationarity. By contrast, it is known that the replicator dynamics, a well-studied model from evolutionary game theory, eliminates dominated strategies and exhibits convergence of the time-averaged trajectories to interior Nash equilibria in zero-sum games. Thus, using the replicator dynamics as a foundation, we derive an elegant one-line change to policy gradient methods that simply bypasses the gradient step through the softmax, yielding a new algorithm titled Neural Replicator Dynamics (NeuRD). NeuRD reduces to the exponential weights/Hedge algorithm in the single-state all-actions case. Additi
Authors
(none)
Tags
Stats
Related papers
- Interpolating Between Softmax Policy Gradient And Neural Replicator Dynamics With Capped Implicit Exploration (2022)0.00
- Parameter Sharing Deep Deterministic Policy Gradient For Cooperative Multi-agent Reinforcement Learning (2017)0.00
- Softmax Deep Double Deterministic Policy Gradients (2020)0.00
- A Policy Gradient Algorithm For Learning To Learn In Multiagent Reinforcement Learning (2020)0.00
- Local Advantage Actor-critic For Robust Multi-agent Deep Reinforcement Learning (2021)7.81
- Scalable And Sample Efficient Distributed Policy Gradient Algorithms In Multi-agent Networked Systems (2022)0.00
- A Multi-agent Off-policy Actor-critic Algorithm For Distributed Reinforcement Learning (2019)11.39
- Federated Natural Policy Gradient And Actor Critic Methods For Multi-task Reinforcement Learning (2023)0.00