Prior-dependent Analysis Of Posterior Sampling Reinforcement Learning With Function Approximation
2024 Β· Yingru Li, Zhi-Quan Luo
Abstract
This work advances randomized exploration in reinforcement learning (RL) with function approximation modeled by linear mixture MDPs. We establish the first prior-dependent Bayesian regret bound for RL with function approximation; and refine the Bayesian regret analysis for posterior sampling reinforcement learning (PSRL), presenting an upper bound of \(\{\mathcal\{O\}\}(d\sqrt\{H^3 T log T\})\), where \(d\) represents the dimensionality of the transition kernel, \(H\) the planning horizon, and \(T\) the total number of interactions. This signifies a methodological enhancement by optimizing the \(\mathcal\{O\}(\sqrt\{log T\})\) factor over the previous benchmark (Osband and Van Roy, 2014) specified to linear mixture MDPs. Our approach, leveraging a value-targeted model learning perspective, introduces a decoupling argument and a variance reduction technique, moving beyond traditional analyses reliant on confidence sets and concentration inequalities to formalize Bayesian regret bounds m
Authors
(none)
Tags
Stats
Related papers
- Posterior Sampling With Delayed Feedback For Reinforcement Learning With Linear Function Approximation (2023)0.00
- Model-based Reinforcement Learning For Continuous Control With Posterior Sampling (2020)0.00
- Randomized Exploration For Reinforcement Learning With Multinomial Logistic Function Approximation (2024)0.00
- Posterior Sampling For Competitive RL: Function Approximation And Partial Observation (2023)0.00
- Model-based Reinforcement Learning With Multinomial Logistic Function Approximation (2022)2.26
- Non-stationary Reinforcement Learning Under General Function Approximation (2023)0.00
- Uniform-pac Bounds For Reinforcement Learning With Linear Function Approximation (2021)0.00
- Improved Regret For Efficient Online Reinforcement Learning With Linear Function Approximation (2023)0.00