Reward-free RL Is No Harder Than Reward-aware RL In Linear Markov Decision Processes
2022 Β· Andrew Wagenmaker, Yifang Chen, Max Simchowitz, et al.
Abstract
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration, but must propose a near-optimal policy for an arbitrary reward function revealed only after exploring. In the the tabular setting, it is well known that this is a more difficult problem than reward-aware (PAC) RL -- where the agent has access to the reward function during exploration -- with optimal sample complexities in the two settings differing by a factor of \(|\mathcal\{S\}|\), the size of the state space. We show that this separation does not exist in the setting of linear MDPs. We first develop a computationally efficient algorithm for reward-free RL in a \(d\)-dimensional linear MDP with sample complexity scaling as \(\widetilde\{\mathcal\{O\}\}(d^2 H^5/\epsilon^2)\). We then show a lower bound with matching dimension-dependence of \(Ξ©(d^2 H^2/\epsilon^2)\), which holds for the reward-aware RL setting. To our knowledge, our approach is the
Authors
(none)
Tags
Stats
Related papers
- Improved Sample Complexity For Reward-free Reinforcement Learning Under Low-rank Mdps (2023)0.00
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Optimal Horizon-free Reward-free Exploration For Linear Mixture Mdps (2023)0.00
- Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity (2021)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00
- On Reward-free RL With Kernel And Neural Function Approximations: Single-agent MDP And Markov Game (2021)0.00