Decision Mamba: A Multi-grained State Space Model With Self-evolution Regularization For Offline RL
2024 Β· Qi Lv, Xiang Deng, Gongwei Chen, et al.
Abstract
While the conditional sequence modeling with the transformer architecture has demonstrated its effectiveness in dealing with offline reinforcement learning (RL) tasks, it is struggle to handle out-of-distribution states and actions. Existing work attempts to address this issue by data augmentation with the learned policy or adding extra constraints with the value-based RL algorithm. However, these studies still fail to overcome the following challenges: (1) insufficiently utilizing the historical temporal information among inter-steps, (2) overlooking the local intrastep relationships among return-to-gos (RTGs), states, and actions, (3) overfitting suboptimal trajectories with noisy labels. To address these challenges, we propose Decision Mamba (DM), a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy. DM explicitly models the historical hidden state to extract the temporal information by using the mamba architecture. To capture the relationship
Authors
(none)
Tags
Stats
Related papers
- Decision Mamba: Reinforcement Learning Via Sequence Modeling With Selective State Spaces (2024)0.00
- Drama: Mamba-enabled Model-based Reinforcement Learning Is Sample And Parameter Efficient (2024)0.00
- Self-confirming Transformer For Belief-conditioned Adaptation In Offline Multi-agent Reinforcement Learning (2023)0.00
- Model-based Offline Reinforcement Learning With Reliability-guaranteed Sequence Modeling (2025)0.00
- Q-learning Decision Transformer: Leveraging Dynamic Programming For Conditional Sequence Modelling In Offline RL (2022)0.00
- Harmodt: Harmony Multi-task Decision Transformer For Offline Reinforcement Learning (2024)0.00
- Overcoming Model Bias For Robust Offline Deep Reinforcement Learning (2020)11.58
- Return Augmented Decision Transformer For Off-dynamics Reinforcement Learning (2024)0.00