Exploration Via Epistemic Value Estimation
2023 Β· Simon Schmitt, John Shawe-Taylor, Hado van Hasselt
Abstract
How to efficiently explore in reinforcement learning is an open problem. Many exploration algorithms employ the epistemic uncertainty of their own value predictions -- for instance to compute an exploration bonus or upper confidence bound. Unfortunately the required uncertainty is difficult to estimate in general with function approximation. We propose epistemic value estimation (EVE): a recipe that is compatible with sequential decision making and with neural network function approximators. It equips agents with a tractable posterior over all their parameters from which epistemic value uncertainty can be computed efficiently. We use the recipe to derive an epistemic Q-Learning agent and observe competitive performance on a series of benchmarks. Experiments confirm that the EVE recipe facilitates efficient exploration in hard exploration tasks.
Authors
(none)
Tags
Stats
Related papers
- The Uncertainty Bellman Equation And Exploration (2017)0.00
- Uncertainty Quantification And Exploration For Reinforcement Learning (2019)6.77
- Query The Agent: Improving Sample Efficiency Through Epistemic Uncertainty Estimation (2022)0.00
- Efficient Exploration With Double Uncertain Value Networks (2017)0.00
- Temporal Difference Uncertainties As A Signal For Exploration (2020)0.00
- Exploration Via Elliptical Episodic Bonuses (2022)3.58
- Learning-driven Exploration For Reinforcement Learning (2019)6.45
- Model-based Epistemic Variance Of Values For Risk-aware Policy Optimization (2023)0.00