Instance-dependent Near-optimal Policy Identification In Linear Mdps Via Online Experiment Design
2022 Β· Andrew Wagenmaker, Kevin Jamieson
Abstract
While much progress has been made in understanding the minimax sample complexity of reinforcement learning (RL) -- the complexity of learning on the "worst-case" instance -- such measures of complexity often do not capture the true difficulty of learning. In practice, on an "easy" instance, we might hope to achieve a complexity far better than that achievable on the worst-case instance. In this work we seek to understand the "instance-dependent" complexity of learning near-optimal policies (PAC RL) in the setting of RL with linear function approximation. We propose an algorithm, \textsc\{Pedel\}, which achieves a fine-grained instance-dependent measure of complexity, the first of its kind in the RL with function approximation setting, thereby capturing the difficulty of learning on each particular problem instance. Through an explicit example, we show that \textsc\{Pedel\} yields provable gains over low-regret, minimax-optimal algorithms and that such algorithms are unable to hit the i
Authors
(none)
Tags
Stats
Related papers
- Nearly Minimax Optimal Offline Reinforcement Learning With Linear Function Approximation: Single-agent MDP And Markov Game (2022)0.00
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00
- Policy Finetuning: Bridging Sample-efficient Offline And Online Reinforcement Learning (2021)0.00
- Beyond No Regret: Instance-dependent PAC Reinforcement Learning (2021)0.00
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- Minimax Optimal And Computationally Efficient Algorithms For Distributionally Robust Offline Reinforcement Learning (2024)0.00
- Optimistic Policy Optimization Is Provably Efficient In Non-stationary Mdps (2021)0.00
- Low-switching Policy Gradient With Exploration Via Online Sensitivity Sampling (2023)0.00