Q-learning Decision Transformer: Leveraging Dynamic Programming For Conditional Sequence Modelling In Offline RL
2022 Β· Taku Yamagata, Ahmed Khalil, Raul Santos-Rodriguez
Abstract
Recent works have shown that tackling offline reinforcement learning (RL) with a conditional policy produces promising results. The Decision Transformer (DT) combines the conditional policy approach and a transformer architecture, showing competitive performance against several benchmarks. However, DT lacks stitching ability -- one of the critical abilities for offline RL to learn the optimal policy from sub-optimal trajectories. This issue becomes particularly significant when the offline dataset only contains sub-optimal trajectories. On the other hand, the conventional RL approaches based on Dynamic Programming (such as Q-learning) do not have the same limitation; however, they suffer from unstable learning behaviours, especially when they rely on function approximation in an off-policy learning setting. In this paper, we propose the Q-learning Decision Transformer (QDT) to address the shortcomings of DT by leveraging the benefits of Dynamic Programming (Q-learning). It utilises the
Authors
(none)
Tags
Stats
Related papers
- Q-value Regularized Decision Convformer For Offline Reinforcement Learning (2024)0.00
- When Should We Prefer Decision Transformers For Offline Reinforcement Learning? (2023)0.00
- Quantum Decision Transformers (QDT): Synergistic Entanglement And Interference For Offline Reinforcement Learning (2025)0.00
- Generalized Decision Transformer For Offline Hindsight Information Matching (2021)0.00
- Decision Mamba: A Multi-grained State Space Model With Self-evolution Regularization For Offline RL (2024)0.00
- Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning (2024)0.00
- Enhancing Decision Transformer With Diffusion-based Trajectory Branch Generation (2024)0.00
- DODT: Enhanced Online Decision Transformer Learning Through Dreamer's Actor-critic Trajectory Forecasting (2024)0.00