Last-iterate Convergence Of General Parameterized Policies In Constrained Mdps
2026 Β· Washim Uddin Mondal, Vaneet Aggarwal
Abstract
arXiv:2408.11513v2 Announce Type: replace Abstract: This paper focuses on learning a Constrained Markov Decision Process (CMDP) via general parameterized policies. We propose a Primal-Dual based Regularized Accelerated Natural Policy Gradient (PDR-ANPG) algorithm that uses entropy and quadratic regularizers to reach this goal. For parameterized policy classes with a transferred compatibility approximation error, \(\epsilon_\{\mathrm\{bias\}\}\), PDR-ANPG achieves a last-iterate \(\epsilon\) optimality gap and \(\epsilon\) constraint violation with a sample complexity of \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-2\}\min\\{\epsilon^\{-2\},\epsilon_\{\mathrm\{bias\}\}^\{-\frac\{1\}\{3\}\}\\})\). If the class is incomplete (\(\epsilon_\{\mathrm\{bias\}\}>0\)), then the sample complexity reduces to \(\tilde\{\mathcal\{O\}\}(\epsilon^\{-2\})\) for \(\epsilon<(\epsilon_\{\mathrm\{bias\}\})^\{\frac\{1\}\{6\}\}\). Moreover, for complete policies with \(\epsilon_\{\mathrm\{bias\}\}=0\), our algorit
Authors
(none)
Tags
Stats
Related papers
- Learning General Parameterized Policies For Infinite Horizon Average Reward Constrained Mdps Via Primal-dual Policy Gradient Algorithm (2024)0.00
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Policy Optimization For Constrained Mdps With Provable Fast Global Convergence (2021)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Learning Deterministic Policies With Policy Gradients In Constrained Markov Decision Processes (2025)0.00
- Policy Gradient For Robust Markov Decision Processes (2024)0.00
- Confident Natural Policy Gradient For Local Planning In \(q_\pi\)-realizable Constrained Mdps (2024)0.00
- On The Theory Of Policy Gradient Methods: Optimality, Approximation, And Distribution Shift (2019)0.00