Belief-based Offline Reinforcement Learning For Delay-robust Policy Optimization
2025 Β· Simon Sinong Zhan, Qingyuan Wu, Philip Wang, et al.
Abstract
Offline-to-online deployment of reinforcement-learning (RL) agents must bridge two gaps: (1) the sim-to-real gap, where real systems add latency and other imperfections not present in simulation, and (2) the interaction gap, where policies trained purely offline face out-of-distribution states during online execution because gathering new interaction data is costly or risky. Agents therefore have to generalize from static, delay-free datasets to dynamic, delay-prone environments. Standard offline RL learns from delay-free logs yet must act under delays that break the Markov assumption and hurt performance. We introduce DT-CORL (Delay-Transformer belief policy Constrained Offline RL), an offline-RL framework built to cope with delayed dynamics at deployment. DT-CORL (i) produces delay-robust actions with a transformer-based belief predictor even though it never sees delayed observations during training, and (ii) is markedly more sample-efficient than na\"ive history-augmentation baselin
Authors
(none)
Tags
Stats
Related papers
- Self-confirming Transformer For Belief-conditioned Adaptation In Offline Multi-agent Reinforcement Learning (2023)0.00
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00
- Solving Continual Offline Reinforcement Learning With Decision Transformer (2024)0.00
- Robust Offline Reinforcement Learning With Gradient Penalty And Constraint Relaxation (2022)0.00
- Morel : Model-based Offline Reinforcement Learning (2020)0.00
- Deployment-efficient Reinforcement Learning Via Model-based Offline Optimization (2020)0.00
- Active Advantage-aligned Online Reinforcement Learning With Offline Data (2025)0.00
- PROTO: Iterative Policy Regularized Offline-to-online Reinforcement Learning (2023)0.00