Statistical Guarantees For Offline Domain Randomization
2025 Β· Arnaud Fickinger, Abderrahim Bendahi, Stuart Russell
Abstract
Reinforcement-learning (RL) agents often struggle when deployed from simulation to the real-world. A dominant strategy for reducing the sim-to-real gap is domain randomization (DR) which trains the policy across many simulators produced by sampling dynamics parameters, but standard DR ignores offline data already available from the real system. We study offline domain randomization (ODR), which first fits a distribution over simulator parameters to an offline dataset. While a growing body of empirical work reports substantial gains with algorithms such as DROPO, the theoretical foundations of ODR remain largely unexplored. In this work, we cast ODR as a maximum-likelihood estimation over a parametric simulator family and provide statistical guarantees: under mild regularity and identifiability conditions, the estimator is weakly consistent (it converges in probability to the true dynamics as data grows), and it becomes strongly consistent (i.e., it converges almost surely to the true d
Authors
(none)
Tags
Stats
Related papers
- Towards Data-driven Offline Simulations For Online Reinforcement Learning (2022)0.00
- Data-efficient Domain Randomization With Bayesian Optimization (2020)13.28
- Domain Generalization For Robust Model-based Offline Reinforcement Learning (2022)0.00
- How To Pick The Domain Randomization Parameters For Sim-to-real Transfer Of Reinforcement Learning Policies? (2019)0.00
- Understanding Domain Randomization For Sim-to-real Transfer (2021)0.00
- Achieving The Asymptotically Optimal Sample Complexity Of Offline Reinforcement Learning: A Dro-based Approach (2023)0.00
- Near-optimal Offline Reinforcement Learning Via Double Variance Reduction (2021)0.00
- Bridging Distributionally Robust Learning And Offline RL: An Approach To Mitigate Distribution Shift And Partial Data Coverage (2023)0.00