Q-value Regularized Decision Convformer For Offline Reinforcement Learning
2024 Β· Teng Yan, Zhendong Ruan, Yaobang Cai, et al.
Abstract
As a data-driven paradigm, offline reinforcement learning (Offline RL) has been formulated as sequence modeling, where the Decision Transformer (DT) has demonstrated exceptional capabilities. Unlike previous reinforcement learning methods that fit value functions or compute policy gradients, DT adjusts the autoregressive model based on the expected returns, past states, and actions, using a causally masked Transformer to output the optimal action. However, due to the inconsistency between the sampled returns within a single trajectory and the optimal returns across multiple trajectories, it is challenging to set an expected return to output the optimal action and stitch together suboptimal trajectories. Decision ConvFormer (DC) is easier to understand in the context of modeling RL trajectories within a Markov Decision Process compared to DT. We propose the Q-value Regularized Decision ConvFormer (QDC), which combines the understanding of RL trajectories by DC and incorporates a term th
Authors
(none)
Tags
Stats
Related papers
- Q-learning Decision Transformer: Leveraging Dynamic Programming For Conditional Sequence Modelling In Offline RL (2022)0.00
- When Should We Prefer Decision Transformers For Offline Reinforcement Learning? (2023)0.00
- Quantum Decision Transformers (QDT): Synergistic Entanglement And Interference For Offline Reinforcement Learning (2025)0.00
- Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning (2024)0.00
- Solving Continual Offline Reinforcement Learning With Decision Transformer (2024)0.00
- Generalized Decision Transformer For Offline Hindsight Information Matching (2021)0.00
- Decision Mamba: A Multi-grained State Space Model With Self-evolution Regularization For Offline RL (2024)0.00
- Reinforcement Learning Gradients As Vitamin For Online Finetuning Decision Transformers (2024)0.00