Parameterized Indexed Value Function For Efficient Exploration In Reinforcement Learning
2019 Β· Tian Tan, Zhihan Xiong, Vikranth R. Dwaracherla
Abstract
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computatio
Authors
(none)
Tags
Stats
Related papers
- Efficient Exploration With Double Uncertain Value Networks (2017)0.00
- Uncertainty Quantification And Exploration For Reinforcement Learning (2019)6.77
- Exploration Via Epistemic Value Estimation (2023)2.26
- Information-directed Exploration For Deep Reinforcement Learning (2018)0.00
- Diverse Randomized Value Functions: A Provably Pessimistic Approach For Offline Reinforcement Learning (2024)3.58
- Temporal Difference Uncertainties As A Signal For Exploration (2020)0.00
- MEET: A Monte Carlo Exploration-exploitation Trade-off For Buffer Sampling (2022)2.26
- Value-distributional Model-based Reinforcement Learning (2023)1.56