A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees
2024 Β· Toshinori Kitamura, Tadashi Kozuno, Masahiro Kato, et al.
Abstract
We study a primal-dual (PD) reinforcement learning (RL) algorithm for online constrained Markov decision processes (CMDPs). Despite its widespread practical use, the existing theoretical literature on PD-RL algorithms for this problem only provides sublinear regret guarantees and fails to ensure convergence to optimal policies. In this paper, we introduce a novel policy gradient PD algorithm with uniform probably approximate correctness (Uniform-PAC) guarantees, simultaneously ensuring convergence to optimal policies, sublinear regret, and polynomial sample complexity for any target accuracy. Notably, this represents the first Uniform-PAC algorithm for the online CMDP problem. In addition to the theoretical guarantees, we empirically demonstrate in a simple CMDP that our algorithm converges to optimal policies, while baseline algorithms exhibit oscillatory performance and constraint violation.
Authors
(none)
Tags
Stats
Related papers
- Learning General Parameterized Policies For Infinite Horizon Average Reward Constrained Mdps Via Primal-dual Policy Gradient Algorithm (2024)0.00
- A Near-optimal Primal-dual Method For Off-policy Learning In CMDP (2022)0.00
- Policy Optimization For Constrained Mdps With Provable Fast Global Convergence (2021)0.00
- Provably Efficient Primal-dual Reinforcement Learning For Cmdps With Non-stationary Objectives And Constraints (2022)0.00
- Last-iterate Convergence Of General Parameterized Policies In Constrained Mdps (2026)0.00
- Lyapunov Robust Constrained-mdps: Soft-constrained Robustly Stable Policy Optimization Under Model Uncertainty (2021)0.00
- A Primal-dual Algorithm For Offline Constrained Reinforcement Learning With Linear Mdps (2024)0.00
- Confident Natural Policy Gradient For Local Planning In \(q_\pi\)-realizable Constrained Mdps (2024)0.00