Model-based Offline Reinforcement Learning With Reliability-guaranteed Sequence Modeling
2025 Β· Shenghong He
Abstract
Model-based offline reinforcement learning (MORL) aims to learn a policy by exploiting a dynamics model derived from an existing dataset. Applying conservative quantification to the dynamics model, most existing works on MORL generate trajectories that approximate the real data distribution to facilitate policy learning by using current information (e.g., the state and action at time step \(t\)). However, these works neglect the impact of historical information on environmental dynamics, leading to the generation of unreliable trajectories that may not align with the real data distribution. In this paper, we propose a new MORL algorithm \textbf\{R\}eliability-guaranteed \textbf\{T\}ransformer (RT), which can eliminate unreliable trajectories by calculating the cumulative reliability of the generated trajectory (i.e., using a weighted variational distance away from the real data). Moreover, by sampling candidate actions with high rewards, RT can efficiently generate high-return trajecto
Authors
(none)
Tags
Stats
Related papers
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- Model-based Offline Reinforcement Learning With Adversarial Data Augmentation (2025)0.00
- Model-based Offline Reinforcement Learning With Pessimism-modulated Dynamics Belief (2022)0.00
- Offline Trajectory Optimization For Offline Reinforcement Learning (2024)1.20
- Decision Mamba: A Multi-grained State Space Model With Self-evolution Regularization For Offline RL (2024)0.00
- Offline Safe Reinforcement Learning Using Trajectory Classification (2024)0.00
- Overcoming Model Bias For Robust Offline Deep Reinforcement Learning (2020)11.58
- Belief-based Offline Reinforcement Learning For Delay-robust Policy Optimization (2025)0.00