Expressive Value Learning For Scalable Offline Reinforcement Learning
2025 Β· Nicolas Espinosa-Dice, Kiante Brantley, Wen Sun
Abstract
Reinforcement learning (RL) is a powerful paradigm for learning to make sequences of decisions. However, RL has yet to be fully leveraged in robotics, principally due to its lack of scalability. Offline RL offers a promising avenue by training agents on large, diverse datasets, avoiding the costly real-world interactions of online RL. Scaling offline RL to increasingly complex datasets requires expressive generative models such as diffusion and flow matching. However, existing methods typically depend on either backpropagation through time (BPTT), which is computationally prohibitive, or policy distillation, which introduces compounding errors and limits scalability to larger base policies. In this paper, we consider the question of how to develop a scalable offline RL approach without relying on distillation or backpropagation through time. We introduce Expressive Value Learning for Offline Reinforcement Learning (EVOR): a scalable offline RL approach that integrates both expressive p
Authors
(none)
Tags
Stats
Related papers
- Diffusion Policies As An Expressive Policy Class For Offline Reinforcement Learning (2022)0.00
- Preferred-action-optimized Diffusion Policies For Offline Reinforcement Learning (2024)0.00
- AWAC: Accelerating Online Reinforcement Learning With Offline Datasets (2020)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- Data Valuation For Offline Reinforcement Learning (2022)0.00
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- A Workflow For Offline Model-free Robotic Reinforcement Learning (2021)0.00
- Boosting Offline Reinforcement Learning With Residual Generative Modeling (2021)0.00