Reinforcement Learning: A Comparison Of UCB Versus Alternative Adaptive Policies
2019 Β· Wesley Cowan, Michael N. Katehakis, Daniel Pirutinsky
Abstract
In this paper we consider the basic version of Reinforcement Learning (RL) that involves computing optimal data driven (adaptive) policies for Markovian decision process with unknown transition probabilities. We provide a brief survey of the state of the art of the area and we compare the performance of the classic UCB policy of \cc\{bkmdp97\} with a new policy developed herein which we call MDP-Deterministic Minimum Empirical Divergence (MDP-DMED), and a method based on Posterior sampling (MDP-PS).
Authors
(none)
Tags
Stats
Related papers
- On Learning History Based Policies For Controlling Markov Decision Processes (2022)0.00
- Contrastive UCB: Provably Efficient Contrastive Self-supervised Learning In Online Reinforcement Learning (2022)0.00
- Online Bayesian Risk-averse Reinforcement Learning (2025)0.00
- Unified Algorithms For RL With Decision-estimation Coefficients: PAC, Reward-free, Preference-based Learning, And Beyond (2022)5.24
- Robust Model-based Reinforcement Learning With An Adversarial Auxiliary Model (2024)0.00
- A General Markov Decision Process Framework For Directly Learning Optimal Control Policies (2019)0.00
- Upside-down Reinforcement Learning For More Interpretable Optimal Control (2024)0.00
- Efficient Model-based Reinforcement Learning Through Optimistic Policy Search And Planning (2020)0.00