Return-aligned Decision Transformer
2024 Β· Tsunehiko Tanaka, Kenshi Abe, Kaito Ariu, et al.
Abstract
Traditional approaches in offline reinforcement learning aim to learn the optimal policy that maximizes the cumulative reward, also known as return. It is increasingly important to adjust the performance of AI agents to meet human requirements, for example, in applications like video games and education tools. Decision Transformer (DT) optimizes a policy that generates actions conditioned on the target return through supervised learning and includes a mechanism to control the agent's performance using the target return. However, the action generation is hardly influenced by the target return because DT's self-attention allocates scarce attention scores to the return tokens. In this paper, we propose Return-Aligned Decision Transformer (RADT), designed to more effectively align the actual return with the target return. RADT leverages features extracted by paying attention solely to the return, enabling action generation to consistently depend on the target return. Extensive experiments
Authors
(none)
Tags
Stats
Related papers
- Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning (2024)0.00
- Adversarially Robust Decision Transformer (2024)0.00
- Double Check My Desired Return: Transformer With Target Alignment For Offline Reinforcement Learning (2025)0.00
- When Should We Prefer Decision Transformers For Offline Reinforcement Learning? (2023)0.00
- Reinforcement Learning Gradients As Vitamin For Online Finetuning Decision Transformers (2024)0.00
- DODT: Enhanced Online Decision Transformer Learning Through Dreamer's Actor-critic Trajectory Forecasting (2024)0.00
- Waypoint Transformer: Reinforcement Learning Via Supervised Learning With Intermediate Targets (2023)0.00
- Q-learning Decision Transformer: Leveraging Dynamic Programming For Conditional Sequence Modelling In Offline RL (2022)0.00