Near-optimal Sample Complexities Of Divergence-based S-rectangular Distributionally Robust Reinforcement Learning

·2026

arXiv:li2026near ↗Google Scholar ↗Semantic Scholar ↗

Abstract

arXiv:2505.12202v3 Announce Type: replace Abstract: Distributionally robust reinforcement learning (DR-RL) has recently gained significant attention as a principled approach that addresses discrepancies between training and testing environments. To balance robustness, conservatism, and computational traceability, the literature has introduced DR-RL models with SA-rectangular and S-rectangular adversaries. While most existing statistical analyses focus on SA-rectangular models, owing to their algorithmic simplicity and the optimality of deterministic policies, S-rectangular models more accurately capture distributional discrepancies in many real-world applications and often yield more effective robust randomized policies. In this paper, we study the empirical value iteration algorithm for divergence-based S-rectangular DR-RL and establish near-optimal sample complexity bounds of \(\widetilde\{O\}(|\mathcal\{S\}||\mathcal\{A\}|(1-\gamma)^\{-4\}\epsilon^\{-2\})\), where \(\epsilon\) is t

Abstract

Related papers