Adaptive Tree Backup Algorithms For Temporal-difference Reinforcement Learning
2022 Β· Brett Daley, Isaac Chan
Abstract
Q(\(\sigma\)) is a recently proposed temporal-difference learning method that interpolates between learning from expected backups and sampled backups. It has been shown that intermediate values for the interpolation parameter \(\sigma \in [0,1]\) perform better in practice, and therefore it is commonly believed that \(\sigma\) functions as a bias-variance trade-off parameter to achieve these improvements. In our work, we disprove this notion, showing that the choice of \(\sigma=0\) minimizes variance without increasing bias. This indicates that \(\sigma\) must have some other effect on learning that is not fully understood. As an alternative, we hypothesize the existence of a new trade-off: larger \(\sigma\)-values help overcome poor initializations of the value function, at the expense of higher statistical variance. To automatically balance these considerations, we propose Adaptive Tree Backup (ATB) methods, whose weighted backups evolve as the agent gains experience. Our experiments
Authors
(none)
Tags
Stats
Related papers
- A Unified Approach For Multi-step Temporal-difference Learning With Eligibility Traces In Reinforcement Learning (2018)6.77
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- On The Statistical Benefits Of Temporal Difference Learning (2023)0.00
- Discerning Temporal Difference Learning (2023)0.00
- Tbq(\(\sigma\)): Improving Efficiency Of Trace Utilization For Off-policy Reinforcement Learning (2019)0.00
- Pseudo-quantized Actor-critic Algorithm For Robustness To Noisy Temporal Difference Error (2026)0.00
- Reducing Variance In Temporal-difference Value Estimation Via Ensemble Of Deep Networks (2022)0.00
- The Statistical Benefits Of Quantile Temporal-difference Learning For Value Estimation (2023)0.00