A Near-optimal Primal-dual Method For Off-policy Learning In CMDP
2022 Β· Fan Chen, Junyu Zhang, Zaiwen Wen
Abstract
As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient \(C^*\), we establish an \(Ξ©\left(\frac\{\min\left\\{|\mathcal\{S\}||\mathcal\{A\}|,|\mathcal\{S\}|+I\right\\} C^*\}\{(1-\gamma)^3\epsilon^2\}\right)\) sample complexity lower bound for the offline CMDP problem, where \(I\) stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably gu
Authors
(none)
Tags
Stats
Related papers
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Provably Efficient Primal-dual Reinforcement Learning For Cmdps With Non-stationary Objectives And Constraints (2022)0.00
- A Primal-dual Algorithm For Offline Constrained Reinforcement Learning With Linear Mdps (2024)0.00
- Achieving Zero Constraint Violation For Constrained Reinforcement Learning Via Primal-dual Approach (2021)9.59
- Policy Optimization For Constrained Mdps With Provable Fast Global Convergence (2021)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Stochastic Primal-dual Methods And Sample Complexity Of Reinforcement Learning (2016)0.00
- Learning General Parameterized Policies For Infinite Horizon Average Reward Constrained Mdps Via Primal-dual Policy Gradient Algorithm (2024)0.00