Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning
2024 Β· Ruhan Wang, Yu Yang, Zhishuai Liu, et al.
Abstract
We study offline off-dynamics reinforcement learning (RL) to utilize data from an easily accessible source domain to enhance policy learning in a target domain with limited data. Our approach centers on return-conditioned supervised learning (RCSL), particularly focusing on Decision Transformer (DT) type frameworks, which can predict actions conditioned on desired return guidance and complete trajectory history. Previous works address the dynamics shift problem by augmenting the reward in the trajectory from the source domain to match the optimal trajectory in the target domain. However, this strategy can not be directly applicable in RCSL owing to (1) the unique form of the RCSL policy class, which explicitly depends on the return, and (2) the absence of a straightforward representation of the optimal trajectory distribution. We propose the Return Augmented (REAG) method for DT type frameworks, where we augment the return in the source domain by aligning its distribution with that in
Authors
(none)
Tags
Stats
Related papers
- Return-aligned Decision Transformer (2024)1.69
- When Does Return-conditioned Supervised Learning Work For Offline Reinforcement Learning? (2022)0.00
- Double Check My Desired Return: Transformer With Target Alignment For Offline Reinforcement Learning (2025)0.00
- Offline Trajectory Optimization For Offline Reinforcement Learning (2024)1.20
- Q-learning Decision Transformer: Leveraging Dynamic Programming For Conditional Sequence Modelling In Offline RL (2022)0.00
- Adversarially Robust Decision Transformer (2024)0.00
- Q-value Regularized Decision Convformer For Offline Reinforcement Learning (2024)0.00
- Decision Mamba: A Multi-grained State Space Model With Self-evolution Regularization For Offline RL (2024)0.00