Adaptive Exploration For Data-efficient General Value Function Evaluations
2024 Β· Arushi Jain, Josiah P. Hanna, Doina Precup
Abstract
General Value Functions (GVFs) (Sutton et al., 2011) represent predictive knowledge in reinforcement learning. Each GVF computes the expected return for a given policy, based on a unique reward. Existing methods relying on fixed behavior policies or pre-collected data often face data efficiency issues when learning multiple GVFs in parallel using off-policy methods. To address this, we introduce GVFExplorer, which adaptively learns a single behavior policy that efficiently collects data for evaluating multiple GVFs in parallel. Our method optimizes the behavior policy by minimizing the total variance in return across GVFs, thereby reducing the required environmental interactions. We use an existing temporal-difference-style variance estimator to approximate the return variance. We prove that each behavior policy update decreases the overall mean squared error in GVF predictions. We empirically show our method's performance in tabular and nonlinear function approximation settings, inclu
Authors
(none)
Tags
Stats
Related papers
- Robust And Adaptive Temporal-difference Learning Using An Ensemble Of Gaussian Processes (2021)0.00
- Unifying Value Iteration, Advantage Learning, And Dynamic Policy Programming (2017)0.00
- Guarantees For Epsilon-greedy Reinforcement Learning With Function Approximation (2022)0.00
- Rethinking Value Function Learning For Generalization In Reinforcement Learning (2022)0.00
- What About Inputing Policy In Value Function: Policy Representation And Policy-extended Value Function Approximator (2020)2.26
- Finding Useful Predictions By Meta-gradient Descent To Improve Decision-making (2021)0.00
- Learning Value Functions In Deep Policy Gradients Using Residual Variance (2020)0.00
- Provably Efficient Reward-agnostic Navigation With Linear Value Iteration (2020)0.00