Renewal Monte Carlo: Renewal Theory Based Reinforcement Learning
2018 Β· Jayakumar Subramanian, Aditya Mahajan
Abstract
In this paper, we present an online reinforcement learning algorithm, called Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm and retains the advantages of Monte Carlo methods including low bias, simplicity, and ease of implementation while, at the same time, circumvents their key drawbacks of high variance and delayed (end of episode) updates. The key ideas behind RMC are as follows. First, under any reasonable policy, the reward process is ergodic. So, by renewal theory, the performance of a policy is equal to the ratio of expected discounted reward to the expected discounted time over a regenerative cycle. Second, by carefully examining the expression for performance gradient, we propose a stochastic approximation algorithm that only requires estimates of the expected discounted reward and discounted time over a regenerative cycle and their gradients. We propose two unbiased estimators for evaluat
Authors
(none)
Tags
Stats
Related papers
- Regret-optimal Model-free Reinforcement Learning For Discounted Mdps With Short Burn-in Time (2023)0.00
- On The Convergence Of Reinforcement Learning With Monte Carlo Exploring Starts (2020)0.00
- Online Reinforcement Learning In Markov Decision Process Using Linear Programming (2023)3.58
- On The Convergence Of The Monte Carlo Exploring Starts Algorithm For Reinforcement Learning (2020)0.00
- Reinforcement Learning For Infinite-horizon Average-reward Linear Mdps Via Approximation By Discounted-reward Mdps (2024)0.00
- On The Convergence Of Policy Iteration-based Reinforcement Learning With Monte Carlo Policy Evaluation (2023)0.00
- Model-free Reinforcement Learning In Infinite-horizon Average-reward Markov Decision Processes (2019)0.00
- A Policy Gradient Approach For Finite Horizon Constrained Markov Decision Processes (2022)3.58