Improved Sample Complexity Analysis Of Natural Policy Gradient Algorithm With General Parameterization For Infinite Horizon Discounted Reward Markov Decision Processes
2023 Β· Washim Uddin Mondal, Vaneet Aggarwal
Abstract
We consider the problem of designing sample efficient learning algorithms for infinite horizon discounted reward Markov Decision Process. Specifically, we propose the Accelerated Natural Policy Gradient (ANPG) algorithm that utilizes an accelerated stochastic gradient descent process to obtain the natural policy gradient. ANPG achieves \(\mathcal\{O\}(\{\epsilon^\{-2\}\})\) sample complexity and \(\mathcal\{O\}(\epsilon^\{-1\})\) iteration complexity with general parameterization where \(\epsilon\) defines the optimality error. This improves the state-of-the-art sample complexity by a \(log(\frac\{1\}\{\epsilon\})\) factor. ANPG is a first-order algorithm and unlike some existing literature, does not require the unverifiable assumption that the variance of importance sampling (IS) weights is upper bounded. In the class of Hessian-free and IS-free algorithms, ANPG beats the best-known sample complexity by a factor of \(\mathcal\{O\}(\epsilon^\{-\frac\{1\}\{2\}\})\) and simultaneously ma
Authors
(none)
Tags
Stats
Related papers
- Stochastic Policy Gradient Methods: Improved Sample Complexity For Fisher-non-degenerate Policies (2023)0.00
- On The Linear Convergence Of Natural Policy Gradient Algorithm (2021)0.00
- Provably Fast Convergence Of Independent Natural Policy Gradient For Markov Potential Games (2023)0.00
- Fast Global Convergence Of Natural Policy Gradient Methods With Entropy Regularization (2020)0.00
- Symmetric (optimistic) Natural Policy Gradient For Multi-agent Learning With Parameter Convergence (2022)0.00
- Global Convergence Of Natural Policy Gradient With Hessian-aided Momentum Variance Reduction (2024)0.00
- Optimistic Natural Policy Gradient: A Simple Efficient Policy Optimization Framework For Online RL (2023)0.00
- Anchor-changing Regularized Natural Policy Gradient For Multi-objective Reinforcement Learning (2022)0.00