A Generalized Projected Bellman Error For Off-policy Value Estimation In Reinforcement Learning
2021 Β· Andrew Patterson, Adam White, Martha White
Abstract
Many reinforcement learning algorithms rely on value estimation, however, the most widely used algorithms -- namely temporal difference algorithms -- can diverge under both off-policy sampling and nonlinear function approximation. Many algorithms have been developed for off-policy value estimation based on the linear mean squared projected Bellman error (MSPBE) and are sound under linear function approximation. Extending these methods to the nonlinear case has been largely unsuccessful. Recently, several methods have been introduced that approximate a different objective -- the mean-squared Bellman error (MSBE) -- which naturally facilitate nonlinear approximation. In this work, we build on these insights and introduce a new generalized MSPBE that extends the linear MSPBE to the nonlinear setting. We show how this generalized objective unifies previous work and obtain new bounds for the value error of the solutions of the generalized objective. We derive an easy-to-use, but sound, algo
Authors
(none)
Tags
Stats
Related papers
- Robust Losses For Learning Value Functions (2022)0.00
- Variance-aware Off-policy Evaluation With Linear Function Approximation (2021)0.00
- SBEED: Convergent Reinforcement Learning With Nonlinear Function Approximation (2017)0.00
- Kalman Meets Bellman: Improving Policy Evaluation Through Value Tracking (2020)0.00
- High-confidence Error Estimates For Learned Value Functions (2018)0.00
- Learning Near Optimal Policies With Low Inherent Bellman Error (2020)0.00
- Doubly Robust Off-policy Value And Gradient Estimation For Deterministic Policies (2020)0.00
- The Optimal Approximation Factors In Misspecified Off-policy Value Function Estimation (2023)0.00