Model-based Reinforcement Learning With Double Oracle Efficiency In Policy Optimization And Offline Estimation
2026 Β· Haichen Hu, Jian Qian, David Simchi-Levi
Abstract
arXiv:2605.00393v1 Announce Type: new Abstract: Reinforcement learning (RL) in large environments often suffers from severe computational bottlenecks, as conventional regret minimization algorithms require repeated, costly calls to planning and statistical estimation oracles. While recent advances have explored offline oracle-efficient algorithms, their computational complexity typically scales with the cardinality of the state and action spaces, rendering them intractable for large-scale or continuous environments. In this paper, we address this fundamental limitation by studying offline oracle-efficient episodic RL through the lens of log-barrier and log-determinant regularization. Specifically, for tabular Markov Decision Processes (MDPs), we propose a novel algorithm that achieves the optimal \(\tilde\{O\}(\sqrt\{T\})\) regret bound while requiring only \(O(Hloglog T)\) calls to both the offline statistical estimation and planning oracles when \(T\) is known and \(O(Hlog T)\) call
Authors
(none)
Tags
Stats
Related papers
- Near-optimal Offline Reinforcement Learning Via Double Variance Reduction (2021)0.00
- Oracle Inequalities For Model Selection In Offline Reinforcement Learning (2022)0.00
- Offline Policy Evaluation For Reinforcement Learning With Adaptively Collected Data (2023)0.00
- Pessimism In The Face Of Confounders: Provably Efficient Offline Reinforcement Learning In Partially Observable Markov Decision Processes (2022)0.00
- Near-optimal Provable Uniform Convergence In Offline Policy Evaluation For Reinforcement Learning (2020)0.00
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- Exponential Lower Bounds For Batch Reinforcement Learning: Batch RL Can Be Exponentially Harder Than Online RL (2020)0.00