Model-based Offline Reinforcement Learning With Lower Expectile Q-learning
2024 Β· Kwanyoung Park, Youngwoon Lee
Abstract
Model-based offline reinforcement learning (RL) is a compelling approach that addresses the challenge of learning from limited, static data by generating imaginary trajectories using learned models. However, these approaches often struggle with inaccurate value estimation from model rollouts. In this paper, we introduce a novel model-based offline RL method, Lower Expectile Q-learning (LEQ), which provides a low-bias model-based value estimation via lower expectile regression of \(\lambda\)-returns. Our empirical results show that LEQ significantly outperforms previous model-based offline RL methods on long-horizon tasks, such as the D4RL AntMaze tasks, matching or surpassing the performance of model-free approaches and sequence modeling approaches. Furthermore, LEQ matches the performance of state-of-the-art model-based and model-free methods in dense-reward environments across both state-based tasks (NeoRL and D4RL) and pixel-based tasks (V-D4RL), showing that LEQ works robustly acro
Authors
(none)
Tags
Stats
Related papers
- Boosting Offline Reinforcement Learning With Residual Generative Modeling (2021)0.00
- Quantile Q-learning: Revisiting Offline Extreme Q-learning With Quantile Regression (2025)0.00
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Emaq: Expected-max Q-learning Operator For Simple Yet Effective Offline And Online RL (2020)0.00
- Believe What You See: Implicit Constraint Approach For Offline Multi-agent Reinforcement Learning (2021)0.00
- PIQL: Projective Implicit Q-learning With Support Constraint For Offline Reinforcement Learning (2025)0.00
- Interpretable Performance Analysis Towards Offline Reinforcement Learning: A Dataset Perspective (2021)0.00