A Few Expert Queries Suffices For Sample-efficient RL With Resets And Linear Value Approximation
2022 Β· Philip Amortila, Nan Jiang, Dhruv Madeka, et al.
Abstract
The current paper studies sample-efficient Reinforcement Learning (RL) in settings where only the optimal value function is assumed to be linearly-realizable. It has recently been understood that, even under this seemingly strong assumption and access to a generative model, worst-case sample complexities can be prohibitively (i.e., exponentially) large. We investigate the setting where the learner additionally has access to interactive demonstrations from an expert policy, and we present a statistically and computationally efficient algorithm (Delphi) for blending exploration with expert queries. In particular, Delphi requires \(\tilde\{\mathcal\{O\}\}(d)\) expert queries and a \(\texttt\{poly\}(d,H,|\mathcal\{A\}|,1/\epsilon)\) amount of exploratory samples to provably recover an \(\epsilon\)-suboptimal policy. Compared to pure RL approaches, this corresponds to an exponential improvement in sample complexity with surprisingly-little expert input. Compared to prior imitation learning
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- Provably Efficient Reinforcement Learning With Linear Function Approximation (2019)11.76
- The Effective Horizon Explains Deep RL Performance In Stochastic Environments (2023)3.42
- Sample Complexity Of Offline Reinforcement Learning With Deep Relu Networks (2021)0.00
- Constraint Sampling Reinforcement Learning: Incorporating Expertise For Faster Learning (2021)0.00
- A Nearly Optimal And Low-switching Algorithm For Reinforcement Learning With General Function Approximation (2023)0.00
- Continuous Action Reinforcement Learning From A Mixture Of Interpretable Experts (2020)0.00
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00