From Generative To Episodic: Sample-efficient Replicable Reinforcement Learning
2025 Β· Max Hopkins, Sihan Liu, Christopher Ye, et al.
Abstract
The epidemic failure of replicability across empirical science and machine learning has recently motivated the formal study of replicable learning algorithms [Impagliazzo et al. (2022)]. In batch settings where data comes from a fixed i.i.d. source (e.g., hypothesis testing, supervised learning), the design of data-efficient replicable algorithms is now more or less understood. In contrast, there remain significant gaps in our knowledge for control settings like reinforcement learning where an agent must interact directly with a shifting environment. Karbasi et. al show that with access to a generative model of an environment with \(S\) states and \(A\) actions (the RL 'batch setting'), replicably learning a near-optimal policy costs only \(\tilde\{O\}(S^2A^2)\) samples. On the other hand, the best upper bound without a generative model jumps to \(\tilde\{O\}(S^7 A^7)\) [Eaton et al. (2024)] due to the substantial difficulty of environment exploration. This gap raises a key question in
Authors
(none)
Tags
Stats
Related papers
- Replicability In Reinforcement Learning (2023)0.00
- Replicable Reinforcement Learning (2023)0.00
- Replicable Reinforcement Learning With Linear Function Approximation (2025)5.24
- List Replicable Reinforcement Learning (2025)0.00
- Generative Adversarial Imagination For Sample Efficient Deep Reinforcement Learning (2019)0.00
- Breaking The Sample Size Barrier In Model-based Reinforcement Learning With A Generative Model (2020)9.03
- Beyond Expected Return: Accounting For Policy Reproducibility When Evaluating Reinforcement Learning Algorithms (2023)3.58
- Sample Efficient Active Algorithms For Offline Reinforcement Learning (2026)0.00