Policy Optimization For Constrained Mdps With Provable Fast Global Convergence
2021 Β· Tao Liu, Ruida Zhou, Dileep Kalathil, et al.
Abstract
We address the problem of finding the optimal policy of a constrained Markov decision process (CMDP) using a gradient descent-based algorithm. Previous results have shown that a primal-dual approach can achieve an \(\mathcal\{O\}(1/\sqrt\{T\})\) global convergence rate for both the optimality gap and the constraint violation. We propose a new algorithm called policy mirror descent-primal dual (PMD-PD) algorithm that can provably achieve a faster \(\mathcal\{O\}(log(T)/T)\) convergence rate for both the optimality gap and the constraint violation. For the primal (policy) update, the PMD-PD algorithm utilizes a modified value function and performs natural policy gradient steps, which is equivalent to a mirror descent step with appropriate regularization. For the dual update, the PMD-PD algorithm uses modified Lagrange multipliers to ensure a faster convergence rate. We also present two extensions of this approach to the settings with zero constraint violation and sample-based estimation.
Authors
(none)
Tags
Stats
Related papers
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Policy Gradient For Robust Markov Decision Processes (2024)0.00
- Learning General Parameterized Policies For Infinite Horizon Average Reward Constrained Mdps Via Primal-dual Policy Gradient Algorithm (2024)0.00
- Optimal Convergence Rate For Exact Policy Mirror Descent In Discounted Markov Decision Processes (2023)0.00
- Mirror Descent Policy Optimisation For Robust Constrained Markov Decision Processes (2025)0.00
- Last-iterate Convergence Of General Parameterized Policies In Constrained Mdps (2026)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- A Near-optimal Primal-dual Method For Off-policy Learning In CMDP (2022)0.00