Is Value Learning Really The Main Bottleneck In Offline RL?
2024 Β· Seohong Park, Kevin Frans, Sergey Levine, et al.
Abstract
While imitation learning requires access to high-quality data, offline reinforcement learning (RL) should, in principle, perform similarly or better with substantially lower data quality by using a value function. However, current results indicate that offline RL often performs worse than imitation learning, and it is often unclear what holds back the performance of offline RL. Motivated by this observation, we aim to understand the bottlenecks in current offline RL algorithms. While poor performance of offline RL is typically attributed to an imperfect value function, we ask: is the main bottleneck of offline RL indeed in learning the value function, or something else? To answer this question, we perform a systematic empirical study of (1) value learning, (2) policy extraction, and (3) policy generalization in offline RL problems, analyzing how these components affect performance. We make two surprising observations. First, we find that the choice of a policy extraction algorithm sign
Authors
(none)
Tags
Stats
Related papers
- Offline Reinforcement Learning: Fundamental Barriers For Value Function Approximation (2021)0.00
- Expert Or Not? Assessing Data Quality In Offline Reinforcement Learning (2025)0.00
- Data Valuation For Offline Reinforcement Learning (2022)0.00
- Expressive Value Learning For Scalable Offline Reinforcement Learning (2025)0.00
- POPO: Pessimistic Offline Policy Optimization (2020)5.24
- Know Your Boundaries: The Necessity Of Explicit Behavioral Cloning In Offline RL (2022)0.00
- Bridging Offline Reinforcement Learning And Imitation Learning: A Tale Of Pessimism (2021)0.00
- A Perspective Of Q-value Estimation On Offline-to-online Reinforcement Learning (2023)7.81