Learning Deterministic Policies With Policy Gradients In Constrained Markov Decision Processes
2025 Β· Alessandro Montenegro, Leonardo Cesani, Marco Mussi, et al.
Abstract
Constrained Reinforcement Learning (CRL) addresses sequential decision-making problems where agents are required to achieve goals by maximizing the expected return while meeting domain-specific constraints. In this setting, policy-based methods are widely used thanks to their advantages when dealing with continuous-control problems. These methods search in the policy space with an action-based or a parameter-based exploration strategy, depending on whether they learn the parameters of a stochastic policy or those of a stochastic hyperpolicy. We introduce an exploration-agnostic algorithm, called C-PG, which enjoys global last-iterate convergence guarantees under gradient domination assumptions. Furthermore, under specific noise models where the (hyper)policy is expressed as a stochastic perturbation of the actions or of the parameters of an underlying deterministic policy, we additionally establish global last-iterate convergence guarantees of C-PG to the optimal deterministic policy.
Authors
(none)
Tags
Stats
Related papers
- Last-iterate Global Convergence Of Policy Gradients For Constrained Reinforcement Learning (2024)0.00
- Learning Optimal Deterministic Policies With Stochastic Policy Gradients (2024)0.00
- Policy Gradient Algorithms With Monte Carlo Tree Learning For Non-markov Decision Processes (2022)0.00
- PC-PG: Policy Cover Directed Exploration For Provable Policy Gradient Learning (2020)0.00
- Deterministic Policy Gradient For Reinforcement Learning With Continuous Time And State (2025)0.00
- A Policy Gradient Approach For Finite Horizon Constrained Markov Decision Processes (2022)3.58
- Global Convergence Of Policy Gradient Methods In Reinforcement Learning, Games And Control (2023)0.00
- Probabilistic Satisfaction Of Temporal Logic Constraints In Reinforcement Learning Via Adaptive Policy-switching (2024)0.00