Sample Efficient Reinforcement Learning In Continuous State Spaces: A Perspective Beyond Linearity
2021 Β· Dhruv Malik, Aldo Pacchiano, Vishwak Srinivasan, et al.
Abstract
Reinforcement learning (RL) is empirically successful in complex nonlinear Markov decision processes (MDPs) with continuous state spaces. By contrast, the majority of theoretical RL literature requires the MDP to satisfy some form of linear structure, in order to guarantee sample efficient RL. Such efforts typically assume the transition dynamics or value function of the MDP are described by linear functions of the state features. To resolve this discrepancy between theory and practice, we introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions. We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition. Our algorithm requires minimal assumptions on the policy class, which can include multi-layer neural networks with nonlinear activation functions. Notably, the EPW condition is directly motivated by popular gaming benchmarks, and we
Authors
(none)
Tags
Stats
Related papers
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00
- Sample-efficient Reinforcement Learning Is Feasible For Linearly Realizable Mdps With Limited Revisiting (2021)0.00
- Taming "data-hungry" Reinforcement Learning? Stability In Continuous State-action Spaces (2024)2.26
- Reward-free RL Is No Harder Than Reward-aware RL In Linear Markov Decision Processes (2022)0.00
- Episodic Reinforcement Learning With Expanded State-reward Space (2024)0.00
- Sample-efficient Reinforcement Learning In The Presence Of Exogenous Information (2022)0.00
- Model-based Reinforcement Learning For Continuous Control With Posterior Sampling (2020)0.00
- On The Sample Complexity And Metastability Of Heavy-tailed Policy Search In Continuous Control (2021)0.00