Reinforcement Learning With Random Delays
2020 Β· Simon Ramstedt, Yann Bouteiller, Giovanni Beltrame, et al.
Abstract
Action and observation delays commonly occur in many Reinforcement Learning applications, such as remote control scenarios. We study the anatomy of randomly delayed environments, and show that partially resampling trajectory fragments in hindsight allows for off-policy multi-step value estimation. We apply this principle to derive Delay-Correcting Actor-Critic (DCAC), an algorithm based on Soft Actor-Critic with significantly better performance in environments with delays. This is shown theoretically and also demonstrated practically on a delay-augmented version of the MuJoCo continuous control benchmark.
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning Via Conservative Agent For Environments With Random Delays (2025)0.00
- Model-based Reinforcement Learning Under Random Observation Delays (2025)0.00
- Revisiting State Augmentation Methods For Reinforcement Learning With Stochastic Delays (2021)10.35
- Delay-aware Multi-agent Reinforcement Learning For Cooperative And Competitive Environments (2020)0.00
- Reinforcement Learning For Control Systems With Time Delays: A Comprehensive Survey (2026)0.00
- Reinforcement Learning With Delayed, Composite, And Partially Anonymous Reward (2023)0.00
- Multi-agent Reinforcement Learning With Reward Delays (2022)0.00
- Boosting Reinforcement Learning With Strongly Delayed Feedback Through Auxiliary Short Delays (2024)1.69