Rvs: What Is Essential For Offline RL Via Supervised Learning?
2021 Β· Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, et al.
Abstract
Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL. When does this hold true, and which algorithmic components are necessary? Through extensive experiments, we boil supervised learning for offline RL down to its essential elements. In every environment suite we consider, simply maximizing likelihood with a two-layer feedforward MLP is competitive with state-of-the-art results of substantially more complex methods based on TD learning or sequence modeling with Transformers. Carefully choosing model capacity (e.g., via regularization or architecture) and choosing which information to condition on (e.g., goals or rewards) are critical for performance. These insights serve as a field guide for practitioners doing Reinforcement Learning via Supervised Learning (which we coin "RvS learning"). They also probe the limits of existing RvS methods, which are comparatively weak on random data, and suggest a nu
Authors
(none)
Tags
Stats
Related papers
- Representation Matters: Offline Pretraining For Sequential Decision Making (2021)0.00
- When Does Return-conditioned Supervised Learning Work For Offline Reinforcement Learning? (2022)0.00
- Expert-supervised Reinforcement Learning For Offline Policy Learning And Evaluation (2020)0.00
- When Should We Prefer Decision Transformers For Offline Reinforcement Learning? (2023)0.00
- Data Valuation For Offline Reinforcement Learning (2022)0.00
- Double Check My Desired Return: Transformer With Target Alignment For Offline Reinforcement Learning (2025)0.00
- An Optimistic Perspective On Offline Reinforcement Learning (2019)0.00
- Leveraging Offline Data In Online Reinforcement Learning (2022)0.00