IDQL: Implicit Q-learning As An Actor-critic Method With Diffusion Policies
2023 Β· Philippe Hansen-Estruch, Ilya Kostrikov, Michael Janner, et al.
Abstract
Effective offline RL methods require properly handling out-of-distribution actions. Implicit Q-learning (IQL) addresses this by training a Q-function using only dataset actions through a modified Bellman backup. However, it is unclear which policy actually attains the values represented by this implicitly trained Q-function. In this paper, we reinterpret IQL as an actor-critic method by generalizing the critic objective and connecting it to a behavior-regularized implicit actor. This generalization shows how the induced actor balances reward maximization and divergence from the behavior policy, with the specific loss choice determining the nature of this tradeoff. Notably, this actor can exhibit complex and multimodal characteristics, suggesting issues with the conditional Gaussian actor fit with advantage weighted regression (AWR) used in prior methods. Instead, we propose using samples from a diffusion parameterized behavior policy and weights computed from the critic to then importa
Authors
(none)
Tags
Stats
Related papers
- Aligniql: Policy Alignment In Implicit Q-learning Through Constrained Optimization (2024)0.00
- Diffusion Actor-critic: Formulating Constrained Policy Iteration As Diffusion Noise Regression For Offline Reinforcement Learning (2024)2.92
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Offline RL With No OOD Actions: In-sample Learning Via Implicit Value Regularization (2023)0.00
- Learning A Diffusion Model Policy From Rewards Via Q-score Matching (2023)0.00
- PIQL: Projective Implicit Q-learning With Support Constraint For Offline Reinforcement Learning (2025)0.00
- Diffusion Policies Creating A Trust Region For Offline Reinforcement Learning (2024)8.04
- DIAR: Diffusion-model-guided Implicit Q-learning With Adaptive Revaluation (2024)0.00