Robust Losses For Learning Value Functions
2022 Β· Andrew Patterson, Victor Liao, Martha White
Abstract
Most value function learning algorithms in reinforcement learning are based on the mean squared (projected) Bellman error. However, squared errors are known to be sensitive to outliers, both skewing the solution of the objective and resulting in high-magnitude and high-variance gradients. To control these high-magnitude updates, typical strategies in RL involve clipping gradients, clipping rewards, rescaling rewards, or clipping errors. While these strategies appear to be related to robust losses -- like the Huber loss -- they are built on semi-gradient update rules which do not minimize a known loss. In this work, we build on recent insights reformulating squared Bellman errors as a saddlepoint optimization problem and propose a saddlepoint reformulation for a Huber Bellman error and Absolute Bellman error. We start from a formalization of robust losses, then derive sound gradient-based approaches to minimize these losses in both the online off-policy prediction and control settings.
Authors
(none)
Tags
Stats
Related papers
- A Kernel Loss For Solving The Bellman Equation (2019)0.00
- A Generalized Projected Bellman Error For Off-policy Value Estimation In Reinforcement Learning (2021)0.00
- Symmetric Q-learning: Reducing Skewness Of Bellman Error In Online Reinforcement Learning (2024)0.00
- A Robust Quantile Huber Loss With Interpretable Parameter Adjustment In Distributional Reinforcement Learning (2024)0.00
- UDQL: Bridging The Gap Between MSE Loss And The Optimal Value Function In Offline Reinforcement Learning (2024)0.00
- The Optimal Approximation Factors In Misspecified Off-policy Value Function Estimation (2023)0.00
- Risk Bounds And Rademacher Complexity In Batch Reinforcement Learning (2021)0.00
- Value Gradient Weighted Model-based Reinforcement Learning (2022)0.00