Exponential Lower Bounds For Batch Reinforcement Learning: Batch RL Can Be Exponentially Harder Than Online RL
2020 Β· Andrea Zanette
Abstract
Several practical applications of reinforcement learning involve an agent learning from past data without the possibility of further exploration. Often these applications require us to 1) identify a near optimal policy or to 2) estimate the value of a target policy. For both tasks we derive *exponential* information-theoretic lower bounds in discounted infinite horizon MDPs with a linear function representation for the action value function even if 1) *realizability* holds, 2) the batch algorithm observes the exact reward and transition *functions*, and 3) the batch algorithm is given the *best* a priori data distribution for the problem class. Our work introduces a new `oracle + batch algorithm' framework to prove lower bounds that hold for every distribution. The work shows an exponential separation between batch and online reinforcement learning.
Authors
(none)
Tags
Stats
Related papers
- Model-based Reinforcement Learning With Double Oracle Efficiency In Policy Optimization And Offline Estimation (2026)0.00
- Risk Bounds And Rademacher Complexity In Batch Reinforcement Learning (2021)0.00
- Off-policy Deep Reinforcement Learning Without Exploration (2018)0.00
- Causality And Batch Reinforcement Learning: Complementary Approaches To Planning In Unknown Domains (2020)0.00
- Stackelberg Batch Policy Learning (2023)0.00
- Oracle Inequalities For Model Selection In Offline Reinforcement Learning (2022)0.00
- Interpretable Performance Analysis Towards Offline Reinforcement Learning: A Dataset Perspective (2021)0.00
- Distributionally Robust Offline Reinforcement Learning With Linear Function Approximation (2022)0.00