Is The Bellman Residual A Bad Proxy?
2016 Β· Matthieu Geist, Bilal Piot, Olivier Pietquin
Abstract
This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual \(\|T_* v_\pi - v_\pi\|_\{1,\nu\}\) over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly address
Authors
(none)
Tags
Stats
Related papers
- Towards Blackwell Optimality: Bellman Optimality Is All You Can Get (2025)0.00
- Stability And Generalization For Bellman Residuals (2025)0.00
- Towards Optimal Adversarial Robust Q-learning With Bellman Infinity-error (2024)0.00
- A Kernel Loss For Solving The Bellman Equation (2019)0.00
- Proximal Bellman Mappings For Reinforcement Learning And Their Application To Robust Adaptive Filtering (2023)2.26
- Kalman Meets Bellman: Improving Policy Evaluation Through Value Tracking (2020)0.00
- Revisiting The Softmax Bellman Operator: New Benefits And New Perspective (2018)0.00
- Beyond Variance Reduction: Understanding The True Impact Of Baselines On Policy Optimization (2020)0.00