Confident Natural Policy Gradient For Local Planning In \(q_\pi\)-realizable Constrained Mdps
2024 · Tian Tian, Lin F. Yang, Csaba Szepesvári
Abstract
The constrained Markov decision process (CMDP) framework emerges as an important reinforcement learning approach for imposing safety or other critical objectives while maximizing cumulative reward. However, the current understanding of how to learn efficiently in a CMDP environment with a potentially infinite number of states remains under investigation, particularly when function approximation is applied to the value functions. In this paper, we address the learning problem given linear function approximation with \(q_\{\pi\}\)-realizability, where the value functions of all policies are linearly representable with a known feature map, a setting known to be more general and challenging than other linear settings. Utilizing a local-access model, we propose a novel primal-dual algorithm that, after \(\tilde\{O\}(\text\{poly\}(d) \epsilon^\{-3\})\) queries, outputs with high probability a policy that strictly satisfies the constraints while nearly optimizing the value with respect to a r
Authors
(none)
Tags
Stats
Related papers
- Online RL In Linearly \(q^\pi\)-realizable Mdps Is As Easy As In Linear Mdps If You Learn What To Ignore (2023)0.00
- A Policy Gradient Primal-dual Algorithm For Constrained Mdps With Uniform PAC Guarantees (2024)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- A Safe Exploration Approach To Constrained Markov Decision Processes (2023)0.00
- Learning General Parameterized Policies For Infinite Horizon Average Reward Constrained Mdps Via Primal-dual Policy Gradient Algorithm (2024)0.00
- Provably Efficient Primal-dual Reinforcement Learning For Cmdps With Non-stationary Objectives And Constraints (2022)0.00
- A Near-optimal Primal-dual Method For Off-policy Learning In CMDP (2022)0.00
- Last-iterate Convergence Of General Parameterized Policies In Constrained Mdps (2026)0.00