Generalized Decision Transformer For Offline Hindsight Information Matching
2021 Β· Hiroki Furuta, Yutaka Matsuo, Shixiang Shane Gu
Abstract
How to extract as much learning signal from each trajectory data has been a key problem in reinforcement learning (RL), where sample inefficiency has posed serious challenges for practical applications. Recent works have shown that using expressive policy function approximators and conditioning on future trajectory information -- such as future states in hindsight experience replay or returns-to-go in Decision Transformer (DT) -- enables efficient learning of multi-task policies, where at times online RL is fully replaced by offline behavioral cloning, e.g. sequence modeling. We demonstrate that all these approaches are doing hindsight information matching (HIM) -- training policies that can output the rest of trajectory that matches some statistics of future state information. We present Generalized Decision Transformer (GDT) for solving any HIM problem, and show how different choices for the feature function and the anti-causal aggregator not only recover DT as a special case, but al
Authors
(none)
Tags
Stats
Related papers
- Q-learning Decision Transformer: Leveraging Dynamic Programming For Conditional Sequence Modelling In Offline RL (2022)0.00
- When Should We Prefer Decision Transformers For Offline Reinforcement Learning? (2023)0.00
- Enhancing Decision Transformer With Diffusion-based Trajectory Branch Generation (2024)0.00
- Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning (2024)0.00
- Q-value Regularized Decision Convformer For Offline Reinforcement Learning (2024)0.00
- DODT: Enhanced Online Decision Transformer Learning Through Dreamer's Actor-critic Trajectory Forecasting (2024)0.00
- Harmodt: Harmony Multi-task Decision Transformer For Offline Reinforcement Learning (2024)0.00
- Return-aligned Decision Transformer (2024)1.69