Abstract

This work advances randomized exploration in reinforcement learning (RL) with function approximation modeled by linear mixture MDPs. We establish the first prior-dependent Bayesian regret bound for RL with function approximation; and refine the Bayesian regret analysis for posterior sampling reinforcement learning (PSRL), presenting an upper bound of \(\{\mathcal\{O\}\}(d\sqrt\{H^3 T log T\})\), where \(d\) represents the dimensionality of the transition kernel, \(H\) the planning horizon, and \(T\) the total number of interactions. This signifies a methodological enhancement by optimizing the \(\mathcal\{O\}(\sqrt\{log T\})\) factor over the previous benchmark (Osband and Van Roy, 2014) specified to linear mixture MDPs. Our approach, leveraging a value-targeted model learning perspective, introduces a decoupling argument and a variance reduction technique, moving beyond traditional analyses reliant on confidence sets and concentration inequalities to formalize Bayesian regret bounds m

Authors

(none)

Tags

  • Exploration

Stats

  • citations0
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score0.00
  • arxiv keyli2024prior

Related papers