Primal-dual \(\pi\) Learning: Sample Complexity And Sublinear Run Time For Ergodic Markov Decision Problems
2017 Β· Mengdi Wang
Abstract
Consider the problem of approximating the optimal policy of a Markov decision process (MDP) by sampling state transitions. In contrast to existing reinforcement learning methods that are based on successive approximations to the nonlinear Bellman equation, we propose a Primal-Dual \(\pi\) Learning method in light of the linear duality between the value and policy. The \(\pi\) learning method is model-free and makes primal-dual updates to the policy and value vectors as new data are revealed. For infinite-horizon undiscounted Markov decision process with finite state space \(S\) and finite action space \(A\), the \(\pi\) learning method finds an \(\epsilon\)-optimal policy using the following number of sample transitions $\( \tilde\{O\}( \frac\{(\tau\cdot t^*_\{mix\})^2 |S| |A| \}\{\epsilon^2\} ),\)\( where \)t^*_\{mix\}\( is an upper bound of mixing times across all policies and \)\tau\( is a parameter characterizing the range of stationary distributions across policies. The \)\pi\( le
Authors
(none)
Tags
Stats
Related papers
- Stochastic Primal-dual Methods And Sample Complexity Of Reinforcement Learning (2016)0.00
- Deep Primal-dual Reinforcement Learning: Accelerating Actor-critic Using Bellman Duality (2017)0.00
- Optimal Sample Complexity For Average Reward Markov Decision Processes (2023)0.00
- Model-free Reinforcement Learning: From Clipped Pseudo-regret To Sample Complexity (2020)0.00
- Efficient Performance Bounds For Primal-dual Reinforcement Learning From Demonstrations (2021)0.00
- A Primal-dual Algorithm For Offline Constrained Reinforcement Learning With Linear Mdps (2024)0.00
- Projection By Convolution: Optimal Sample Complexity For Reinforcement Learning In Continuous-space Mdps (2024)0.00
- Bayesian Learning Of Optimal Policies In Markov Decision Processes With Countably Infinite State-space (2023)0.00