Offline RL With No OOD Actions: In-sample Learning Via Implicit Value Regularization
2023 Β· Haoran Xu, Li Jiang, Jianxiong Li, et al.
Abstract
Most offline reinforcement learning (RL) methods suffer from the trade-off between improving the policy to surpass the behavior policy and constraining the policy to limit the deviation from the behavior policy as computing \(Q\)-values using out-of-distribution (OOD) actions will suffer from errors due to distributional shift. The recently proposed \textit\{In-sample Learning\} paradigm (i.e., IQL), which improves the policy by quantile regression using only data samples, shows great promise because it learns an optimal policy without querying the value function of any unseen actions. However, it remains unclear how this type of method handles the distributional shift in learning the value function. In this work, we make a key finding that the in-sample learning paradigm arises under the \textit\{Implicit Value Regularization\} (IVR) framework. This gives a deeper understanding of why the in-sample learning paradigm works, i.e., it applies implicit value regularization to the policy.
Authors
(none)
Tags
Stats
Related papers
- Mildly Conservative Q-learning For Offline Reinforcement Learning (2022)0.00
- On Instrumental Variable Regression For Deep Offline Policy Evaluation (2021)0.00
- IDQL: Implicit Q-learning As An Actor-critic Method With Diffusion Policies (2023)0.00
- Diverse Randomized Value Functions: A Provably Pessimistic Approach For Offline Reinforcement Learning (2024)3.58
- Instrumental Variable Value Iteration For Causal Offline Reinforcement Learning (2021)0.00
- Offline Policy Optimization In RL With Variance Regularizaton (2022)0.00
- PIQL: Projective Implicit Q-learning With Support Constraint For Offline Reinforcement Learning (2025)0.00
- Offline Multi-agent Reinforcement Learning With Implicit Global-to-local Value Regularization (2023)5.84