You Can't Count On Luck: Why Decision Transformers And Rvs Fail In Stochastic Environments
2022 Β· Keiran Paster, Sheila McIlraith, Jimmy Ba
Abstract
Recently, methods such as Decision Transformer that reduce reinforcement learning to a prediction task and solve it via supervised learning (RvS) have become popular due to their simplicity, robustness to hyperparameters, and strong overall performance on offline RL tasks. However, simply conditioning a probabilistic model on a desired return and taking the predicted action can fail dramatically in stochastic environments since trajectories that result in a return may have only achieved that return due to luck. In this work, we describe the limitations of RvS approaches in stochastic environments and propose a solution. Rather than simply conditioning on the return of a single trajectory as is standard practice, our proposed method, ESPER, learns to cluster trajectories and conditions on average cluster returns, which are independent from environment stochasticity. Doing so allows ESPER to achieve strong alignment between target return and expected performance in real environments. We
Authors
(none)
Tags
Stats
Related papers
- Adversarially Robust Decision Transformer (2024)0.00
- Reinforcement Learning With Non-ergodic Reward Increments: Robustness Via Ergodicity Transformations (2023)0.00
- Rvs: What Is Essential For Offline RL Via Supervised Learning? (2021)0.00
- Successor Uncertainties: Exploration And Uncertainty In Temporal Difference Learning (2018)0.00
- On The Convergence And Stability Of Upside-down Reinforcement Learning, Goal-conditioned Supervised Learning, And Online Decision Transformers (2025)0.00
- Streetwise Agents: Empowering Offline RL Policies To Outsmart Exogenous Stochastic Disturbances In RTC (2024)0.00
- Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning (2024)0.00
- Stochastic Reinforcement Learning (2019)5.24