Finite-sample Analysis Of The Monte Carlo Exploring Starts Algorithm For Reinforcement Learning
2024 Β· Suei-Wen Chen, Keith Ross, Pierre Youssef
Abstract
Monte Carlo Exploring Starts (MCES), which aims to learn the optimal policy using only sample returns, is a simple and natural algorithm in reinforcement learning which has been shown to converge under various conditions. However, the convergence rate analysis for MCES-style algorithms in the form of sample complexity has received very little attention. In this paper we develop a finite sample bound for a modified MCES algorithm which solves the stochastic shortest path problem. To this end, we prove a novel result on the convergence rate of the policy iteration algorithm. This result implies that with probability at least \(1-\delta\), the algorithm returns an optimal policy after \(\tilde\{O\}(SAK^3log^3\frac\{1\}\{\delta\})\) sampled episodes, where \(S\) and \(A\) denote the number of states and actions respectively, \(K\) is a proxy for episode length, and \(\tilde\{O\}\) hides logarithmic factors and constants depending on the rewards of the environment that are assumed to be kno
Authors
(none)
Tags
Stats
Related papers
- On The Convergence Of The Monte Carlo Exploring Starts Algorithm For Reinforcement Learning (2020)0.00
- On The Convergence Of Reinforcement Learning With Monte Carlo Exploring Starts (2020)0.00
- On The Convergence Of Policy Iteration-based Reinforcement Learning With Monte Carlo Policy Evaluation (2023)0.00
- Minimax-optimal Reward-agnostic Exploration In Reinforcement Learning (2023)0.00
- Q-learning With UCB Exploration Is Sample Efficient For Infinite-horizon MDP (2019)0.00
- Guided Exploration In Reinforcement Learning Via Monte Carlo Critic Optimization (2022)0.00
- Model-free Reinforcement Learning: From Clipped Pseudo-regret To Sample Complexity (2020)0.00
- Finite-sample Analysis Of Nonlinear Stochastic Approximation With Applications In Reinforcement Learning (2019)10.35