On Oracle-efficient PAC RL With Rich Observations
2018 Β· Christoph Dann, Nan Jiang, Akshay Krishnamurthy, et al.
Abstract
We study the computational tractability of PAC reinforcement learning with rich observations. We present new provably sample-efficient algorithms for environments with deterministic hidden state dynamics and stochastic rich observations. These methods operate in an oracle model of computation -- accessing policy and value function classes exclusively through standard optimization primitives -- and therefore represent computationally efficient alternatives to prior algorithms that require enumeration. With stochastic hidden state dynamics, we prove that the only known sample-efficient algorithm, OLIVE, cannot be implemented in the oracle model. We also present several examples that illustrate fundamental challenges of tractable PAC reinforcement learning in such general settings.
Authors
(none)
Tags
Stats
Related papers
- Sample And Oracle Efficient Reinforcement Learning For Mdps With Linearly-realizable Value Functions (2024)0.00
- Model-based RL In Contextual Decision Processes: PAC Bounds And Exponential Improvements Over Model-free Approaches (2018)0.00
- Kinematic State Abstraction And Provably Efficient Rich-observation Reinforcement Learning (2019)0.00
- Computationally Efficient PAC RL In Pomdps With Latent Determinism And Conditional Embeddings (2022)0.00
- Computably Continuous Reinforcement-learning Objectives Are Pac-learnable (2023)7.81
- Model-based Reinforcement Learning With Double Oracle Efficiency In Policy Optimization And Offline Estimation (2026)0.00
- When Simple Exploration Is Sample Efficient: Identifying Sufficient Conditions For Random Exploration To Yield PAC RL Algorithms (2018)0.00
- Provable Representation With Efficient Planning For Partial Observable Reinforcement Learning (2023)0.00