Adaptive Trade-offs In Off-policy Learning
2019 Β· Mark Rowland, Will Dabney, RΓ©mi Munos
Abstract
A great variety of off-policy learning algorithms exist in the literature, and new breakthroughs in this area continue to be made, improving theoretical understanding and yielding state-of-the-art reinforcement learning algorithms. In this paper, we take a unifying view of this space of algorithms, and consider their trade-offs of three fundamental quantities: update variance, fixed-point bias, and contraction rate. This leads to new perspectives of existing methods, and also naturally yields novel algorithms for off-policy evaluation and control. We develop one such algorithm, C-trace, demonstrating that it is able to more efficiently make these trade-offs than existing methods in use, and that it can be scaled to yield state-of-the-art performance in large-scale environments.
Authors
(none)
Tags
Stats
Related papers
- Trajectory-aware Eligibility Traces For Off-policy Reinforcement Learning (2023)0.00
- Distillation Policy Optimization (2023)0.00
- Behaviour Policy Optimization: Provably Lower Variance Return Estimates For Off-policy Reinforcement Learning (2025)0.00
- Improving The Efficiency Of Off-policy Reinforcement Learning By Accounting For Past Decisions (2021)0.00
- Stable And Efficient Policy Evaluation (2020)0.00
- Online Off-policy Prediction (2018)0.00
- Off-policy Policy Gradient Algorithms By Constraining The State Distribution Shift (2019)0.00
- Online Hyper-parameter Tuning In Off-policy Learning Via Evolutionary Strategies (2020)0.00