Posterior Sampling For Large Scale Reinforcement Learning
2017 Β· Georgios Theocharous, Zheng Wen, Yasin Abbasi-Yadkori, et al.
Abstract
We propose a practical non-episodic PSRL algorithm that unlike recent state-of-the-art PSRL algorithms uses a deterministic, model-independent episode switching schedule. Our algorithm termed deterministic schedule PSRL (DS-PSRL) is efficient in terms of time, sample, and space complexity. We prove a Bayesian regret bound under mild assumptions. Our result is more generally applicable to multiple parameters and continuous state action problems. We compare our algorithm with state-of-the-art PSRL algorithms on standard discrete and continuous problems from the literature. Finally, we show how the assumptions of our algorithm satisfy a sensible parametrization for a large class of problems in sequential recommendations.
Authors
(none)
Tags
Stats
Related papers
- Model-based Reinforcement Learning For Continuous Control With Posterior Sampling (2020)0.00
- Why Is Posterior Sampling Better Than Optimism For Reinforcement Learning? (2016)0.00
- Posterior Sampling For Continuing Environments (2022)0.00
- Optimistic Posterior Sampling For Reinforcement Learning With Few Samples And Tight Guarantees (2022)0.00
- Dueling Posterior Sampling For Preference-based Reinforcement Learning (2019)0.00
- Breaking The Sample Complexity Barrier To Regret-optimal Model-free Reinforcement Learning (2021)0.00
- Posterior Sampling-based Online Learning For Episodic Pomdps (2023)0.00
- Episodic Reinforcement Learning With Expanded State-reward Space (2024)0.00