Simple And Optimal Methods For Stochastic Variational Inequalities, II: Markovian Noise And Policy Evaluation In Reinforcement Learning
2020 Β· Georgios Kotsalis, Guanghui Lan, Tianjiao Li
Abstract
The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A prominent application of our algorithmic developments is the stochastic policy evaluation problem in reinforcement learning. Prior investigations in the literature focused on temporal difference (TD) learning by employing nonsmooth finite time analysis motivated by stochastic subgradient descent leading to certain limitations. These encompass the requirement of analyzing a modified TD algorithm that involves projection to an a-priori defined Euclidean ball, achieving a non-optimal convergence rate and no clear way of deriving the beneficial effects of parallel implementation. Our approach remedies these shortcomings in the broader context of stochastic VIs and in particular when it comes to stochastic policy evaluation. We developed a variety of simple TD learning type algorithms motivated by its original version that maintain its simplicity, while offering distinct advantages from a non-asy
Authors
(none)
Tags
Stats
Related papers
- The ODE Method For Stochastic Approximation And Reinforcement Learning With Markovian Noise (2024)0.00
- Reanalysis Of Variance Reduced Temporal Difference Learning (2020)0.00
- Variational Inference For Model-free And Model-based Reinforcement Learning (2022)0.00
- Accelerated And Instance-optimal Policy Evaluation With Linear Function Approximation (2021)0.00
- On The Convergence And Sample Efficiency Of Variance-reduced Policy Gradient Method (2021)0.00
- Stochastic First-order Methods For Average-reward Markov Decision Processes (2022)3.58
- Revisiting Value Iteration: Unified Analysis Of Discounted And Average-reward Cases (2025)0.00
- Stochastic Variance Reduction For Policy Gradient Estimation (2017)0.00