Temporal Difference Uncertainties As A Signal For Exploration
2020 Β· Sebastian Flennerhag, Jane X. Wang, Pablo Sprechmann, et al.
Abstract
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy, which can yield near-optimal exploration strategies in tabular settings. However, in non-tabular settings that involve function approximators, obtaining accurate uncertainty estimates is almost as challenging a problem. In this paper, we highlight that value estimates are easily biased and temporally inconsistent. In light of this, we propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors. This exploration signal controls for state-action transitions so as to isolate uncertainty in value that is due to uncertainty over the agent's parameters. Because our measure of uncertainty conditions on state-action transitions, we cannot act on this measure directly. Instead, we incorporate it as an intrinsic reward and treat exploration as a separate learning problem, induced by the ag
Authors
(none)
Tags
Stats
Related papers
- Efficient Exploration With Double Uncertain Value Networks (2017)0.00
- The Uncertainty Bellman Equation And Exploration (2017)0.00
- Adaptive Temporal-difference Learning For Policy Evaluation With Per-state Uncertainty Estimates (2019)0.00
- Uncertainty Quantification And Exploration For Reinforcement Learning (2019)6.77
- Temporal Representations For Exploration: Learning Complex Exploratory Behavior Without Extrinsic Rewards (2026)0.00
- VDSC: Enhancing Exploration Timing With Value Discrepancy And State Counts (2024)0.00
- A Temporally Correlated Latent Exploration For Reinforcement Learning (2024)0.00
- Accelerating Reinforcement Learning With Value-conditional State Entropy Exploration (2023)0.00