The Reactor: A Fast And Sample-efficient Actor-critic Agent For Reinforcement Learning
2017 Β· Audrunas Gruslys, Will Dabney, Mohammad Gheshlaghi Azar, et al.
Abstract
In this work we present a new agent architecture, called Reactor, which combines multiple algorithmic and architectural contributions to produce an agent with higher sample-efficiency than Prioritized Dueling DQN (Wang et al., 2016) and Categorical DQN (Bellemare et al., 2017), while giving better run-time performance than A3C (Mnih et al., 2016). Our first contribution is a new policy evaluation algorithm called Distributional Retrace, which brings multi-step off-policy updates to the distributional reinforcement learning setting. The same approach can be used to convert several classes of multi-step policy evaluation algorithms designed for expected value evaluation into distributional ones. Next, we introduce the \b\{eta\}-leave-one-out policy gradient algorithm which improves the trade-off between variance and bias by using action values as a baseline. Our final algorithmic contribution is a new prioritized replay algorithm for sequences, which exploits the temporal locality of nei
Authors
(none)
Tags
Stats
Related papers
- Sample Efficient Actor-critic With Experience Replay (2016)0.00
- Sample-efficient Model-free Reinforcement Learning With Off-policy Critics (2019)9.60
- A Multi-agent Off-policy Actor-critic Algorithm For Distributed Reinforcement Learning (2019)11.39
- Actor-attention-critic For Multi-agent Reinforcement Learning (2018)0.00
- Revisiting Gaussian Mixture Critics In Off-policy Reinforcement Learning: A Sample-based Approach (2022)0.00
- Multi-agent Actor-critic For Mixed Cooperative-competitive Environments (2017)0.00
- Neural Replicator Dynamics (2019)0.00
- Perception-prediction-reaction Agents For Deep Reinforcement Learning (2020)0.00