Stochastic Primal-dual Methods And Sample Complexity Of Reinforcement Learning
2016 Β· Yichen Chen, Mengdi Wang
Abstract
We study the online estimation of the optimal policy of a Markov decision process (MDP). We propose a class of Stochastic Primal-Dual (SPD) methods which exploit the inherent minimax duality of Bellman equations. The SPD methods update a few coordinates of the value and policy estimates as a new state transition is observed. These methods use small storage and has low computational complexity per iteration. The SPD methods find an absolute-\(\epsilon\)-optimal policy, with high probability, using \(\mathcal\{O\}\left(\frac\{|\mathcal\{S\}|^4 |\mathcal\{A\}|^2\sigma^2 \}\{(1-\gamma)^6\epsilon^2\} \right)\) iterations/samples for the infinite-horizon discounted-reward MDP and \(\mathcal\{O\}\left(\frac\{|\mathcal\{S\}|^4 |\mathcal\{A\}|^2H^6\sigma^2 \}\{\epsilon^2\} \right)\) for the finite-horizon MDP.
Authors
(none)
Tags
Stats
Related papers
- Primal-dual \(\pi\) Learning: Sample Complexity And Sublinear Run Time For Ergodic Markov Decision Problems (2017)0.00
- A Primal-dual Algorithm For Offline Constrained Reinforcement Learning With Linear Mdps (2024)0.00
- Projection By Convolution: Optimal Sample Complexity For Reinforcement Learning In Continuous-space Mdps (2024)0.00
- Model-free Reinforcement Learning: From Clipped Pseudo-regret To Sample Complexity (2020)0.00
- A Near-optimal Primal-dual Method For Off-policy Learning In CMDP (2022)0.00
- Efficient Performance Bounds For Primal-dual Reinforcement Learning From Demonstrations (2021)0.00
- Optimal Sample Complexity For Average Reward Markov Decision Processes (2023)0.00
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00