More Efficient Randomized Exploration For Reinforcement Learning Via Approximate Sampling
2024 Β· Haque Ishfaq, Yixin Tan, Yu Yang, et al.
Abstract
Thompson sampling (TS) is one of the most popular exploration techniques in reinforcement learning (RL). However, most TS algorithms with theoretical guarantees are difficult to implement and not generalizable to Deep RL. While the emerging approximate sampling-based exploration schemes are promising, most existing algorithms are specific to linear Markov Decision Processes (MDP) with suboptimal regret bounds, or only use the most basic samplers such as Langevin Monte Carlo. In this work, we propose an algorithmic framework that incorporates different approximate sampling methods with the recently proposed Feel-Good Thompson Sampling (FGTS) approach (Zhang, 2022; Dann et al., 2021), which was previously known to be computationally intractable in general. When applied to linear MDPs, our regret analysis yields the best known dependency of regret on dimensionality, surpassing existing randomized algorithms. Additionally, we provide explicit sampling complexity for each employed sampler.
Authors
(none)
Tags
Stats
Related papers
- A Provably Efficient Model-free Posterior Sampling Method For Episodic Reinforcement Learning (2022)0.00
- Prior-dependent Analysis Of Posterior Sampling Reinforcement Learning With Function Approximation (2024)0.00
- Langevin Thompson Sampling With Logarithmic Communication: Bandits And Reinforcement Learning (2023)0.00
- Provably Efficient Exploration In Constrained Reinforcement Learning:posterior Sampling Is All You Need (2023)0.00
- Efficient Exploration In Average-reward Constrained Reinforcement Learning: Achieving Near-optimal Regret With Posterior Sampling (2024)0.00
- Posterior Sampling For Reinforcement Learning: Worst-case Regret Bounds (2017)0.00
- Online Sub-sampling For Reinforcement Learning With General Function Approximation (2021)0.00
- Making RL With Preference-based Feedback Efficient Via Randomization (2023)0.00