Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions
2024 Β· Zakaria Mhammedi
Abstract
Designing sample-efficient and computationally feasible reinforcement learning (RL) algorithms is particularly challenging in environments with large or infinite state and action spaces. In this paper, we advance this effort by presenting an efficient algorithm for Markov Decision Processes (MDPs) where the state-action value function of any policy is linear in a given feature map. This challenging setting can model environments with infinite states and actions, strictly generalizes classic linear MDPs, and currently lacks a computationally efficient algorithm under online access to the MDP. Specifically, we introduce a new RL algorithm that efficiently finds a near-optimal policy in this setting, using a number of episodes and calls to a cost-sensitive classification (CSC) oracle that are both polynomial in the problem parameters. Notably, our CSC oracle can be efficiently implemented when the feature dimension is constant, representing a clear improvement over state-of-the-art method
Authors
(none)
Tags
Stats
Related papers
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity (2021)0.00
- Sample-efficient Reinforcement Learning For Linearly-parameterized Mdps With A Generative Model (2021)0.00
- A Primal-dual Algorithm For Offline Constrained Reinforcement Learning With Linear Mdps (2024)0.00
- Online RL In Linearly \(q^\pi\)-realizable Mdps Is As Easy As In Linear Mdps If You Learn What To Ignore (2023)0.00
- Reward-free Model-based Reinforcement Learning With Linear Function Approximation (2021)0.00
- Optimal Horizon-free Reward-free Exploration For Linear Mixture Mdps (2023)0.00
- Instance-dependent Near-optimal Policy Identification In Linear Mdps Via Online Experiment Design (2022)0.00