On The Convergence Of Reinforcement Learning With Monte Carlo Exploring Starts
2020 Β· Jun Liu
Abstract
A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method, also known as optimistic policy iteration, in which the value function is approximated by simulated returns and a greedy policy is selected at each iteration. The convergence of this algorithm in the general setting has been an open question. In this paper, we investigate the convergence of this algorithm for the case with undiscounted costs, also known as the stochastic shortest path problem. The results complement existing partial results on this topic and thereby helps further settle the open problem. As a side result, we also provide a proof of a version of the supermartingale convergence theorem commonly used in stochastic approximation.
Authors
(none)
Tags
Stats
Related papers
- On The Convergence Of The Monte Carlo Exploring Starts Algorithm For Reinforcement Learning (2020)0.00
- Finite-sample Analysis Of The Monte Carlo Exploring Starts Algorithm For Reinforcement Learning (2024)0.00
- On The Convergence Of Policy Iteration-based Reinforcement Learning With Monte Carlo Policy Evaluation (2023)0.00
- Renewal Monte Carlo: Renewal Theory Based Reinforcement Learning (2018)7.50
- Guided Exploration In Reinforcement Learning Via Monte Carlo Critic Optimization (2022)0.00
- Conservative Exploration In Reinforcement Learning (2020)0.00
- Model-based Exploration In Monitored Markov Decision Processes (2025)0.00
- Probabilistic Insights For Efficient Exploration Strategies In Reinforcement Learning (2025)0.00