PIQL: Projective Implicit Q-learning With Support Constraint For Offline Reinforcement Learning
2025 Β· Xinchen Han, Hossam Afifi, Michel Marot
Abstract
Offline Reinforcement Learning (RL) faces a fundamental challenge of extrapolation errors caused by out-of-distribution (OOD) actions. Implicit Q-Learning (IQL) employs expectile regression to achieve in-sample learning. Nevertheless, IQL relies on a fixed expectile hyperparameter and a density-based policy improvement method, both of which impede its adaptability and performance. In this paper, we propose Projective IQL (PIQL), a projective variant of IQL enhanced with a support constraint. In the policy evaluation stage, PIQL substitutes the fixed expectile hyperparameter with a projection-based parameter and extends the one-step value estimation to a multi-step formulation. In the policy improvement stage, PIQL adopts a support constraint instead of a density constraint, ensuring closer alignment with the policy evaluation. Theoretically, we demonstrate that PIQL maintains the expectile regression and in-sample learning framework, guarantees monotonic policy improvement, and introdu
Authors
(none)
Tags
Stats
Related papers
- Aligniql: Policy Alignment In Implicit Q-learning Through Constrained Optimization (2024)0.00
- Believe What You See: Implicit Constraint Approach For Offline Multi-agent Reinforcement Learning (2021)0.00
- Projected Off-policy Q-learning (POP-QL) For Stabilizing Offline Reinforcement Learning (2023)0.00
- Quantile Q-learning: Revisiting Offline Extreme Q-learning With Quantile Regression (2025)0.00
- Offline RL With No OOD Actions: In-sample Learning Via Implicit Value Regularization (2023)0.00
- IDQL: Implicit Q-learning As An Actor-critic Method With Diffusion Policies (2023)0.00
- Model-based Offline Reinforcement Learning With Lower Expectile Q-learning (2024)0.00
- Boosting Offline Reinforcement Learning With Residual Generative Modeling (2021)0.00