Asymptotically Optimal Reinforcement Learning In Block Markov Decision Processes
2025 Β· Thomas van Vuren, Fiona Sloothaak, Maarten G. Wolf, et al.
Abstract
The curse of dimensionality renders Reinforcement Learning (RL) impractical in many real-world settings with exponentially large state and action spaces. Yet, many environments exhibit exploitable structure that can accelerate learning. To formalize this idea, we study RL in Block Markov Decision Processes (BMDPs). BMDPs model problems with large observation spaces, but where transition dynamics are fully determined by latent states. Recent advances in clustering methods have enabled the efficient recovery of this latent structure. However, a regret analysis that exploits these techniques to determine their impact on learning performance remained open. We are now addressing this gap by providing a regret analysis that explicitly leverages clustering, demonstrating that accurate latent state estimation can indeed effectively speed up learning. Concretely, this paper analyzes a two-phase RL algorithm for BMDPs that first learns the latent structure through random exploration and then s
Authors
(none)
Tags
Stats
Related papers
- Reinforcement Learning For Non-stationary Markov Decision Processes: The Blessing Of (more) Optimism (2020)0.00
- Model-free Reinforcement Learning For Branching Markov Decision Processes (2021)0.00
- Model-based Reinforcement Learning With Multinomial Logistic Function Approximation (2022)2.26
- Reinforcement Learning In Rich-observation Mdps Using Spectral Methods (2016)0.00
- Provable RL With Exogenous Distractors Via Multistep Inverse Dynamics (2021)0.00
- Nearly Minimax Optimal Reinforcement Learning For Linear Markov Decision Processes (2022)0.00
- Value-biased Maximum Likelihood Estimation For Model-based Reinforcement Learning In Discounted Linear Mdps (2023)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58