Double Check My Desired Return: Transformer With Target Alignment For Offline Reinforcement Learning
2025 Β· Yue Pei, Hongming Zhang, Chao Gao, et al.
Abstract
Offline reinforcement learning (RL) has achieved significant advances in domains such as robotic control, autonomous driving, and medical decision-making. Most existing methods primarily focus on training policies that maximize cumulative returns from a given dataset. However, many real-world applications require precise control over policy performance levels, rather than simply pursuing the best possible return. Reinforcement learning via supervised learning (RvS) frames offline RL as a sequence modeling task, enabling the extraction of diverse policies by conditioning on different desired returns. Yet, existing RvS-based transformers, such as Decision Transformer (DT), struggle to reliably align the actual achieved returns with specified target returns, especially when interpolating within underrepresented returns or extrapolating beyond the dataset. To address this limitation, we propose Doctor, a novel approach that Double Checks the Transformer with target alignment for Offline RL
Authors
(none)
Tags
Stats
Related papers
- Return-aligned Decision Transformer (2024)1.69
- When Should We Prefer Decision Transformers For Offline Reinforcement Learning? (2023)0.00
- Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning (2024)0.00
- Self-confirming Transformer For Belief-conditioned Adaptation In Offline Multi-agent Reinforcement Learning (2023)0.00
- Belief-based Offline Reinforcement Learning For Delay-robust Policy Optimization (2025)0.00
- Offline Trajectory Optimization For Offline Reinforcement Learning (2024)1.20
- Q-learning Decision Transformer: Leveraging Dynamic Programming For Conditional Sequence Modelling In Offline RL (2022)0.00
- Waypoint Transformer: Reinforcement Learning Via Supervised Learning With Intermediate Targets (2023)0.00