Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees
2025 Β· Sourav Ganguly, Kishan Panaganti, Arnob Ghosh, et al.
Abstract
Constrained decision-making is essential for designing safe policies in real-world control systems, yet simulated environments often fail to capture real-world adversities. We consider the problem of learning a policy that will maximize the cumulative reward while satisfying a constraint, even when there is a mismatch between the real model and an accessible simulator/nominal model. In particular, we consider the robust constrained Markov decision problem (RCMDP) where an agent needs to maximize the reward and satisfy the constraint against the worst possible stochastic model under the uncertainty set centered around an unknown nominal model. Primal-dual methods, effective for standard constrained MDP (CMDP), are not applicable here because of the lack of the strong duality property. Further, one cannot apply the standard robust value-iteration based approach on the composite value function either as the worst case models may be different for the reward value function and the constrain
Authors
(none)
Tags
Stats
Related papers
- Lyapunov Robust Constrained-mdps: Soft-constrained Robustly Stable Policy Optimization Under Model Uncertainty (2021)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- Robust Lagrangian And Adversarial Policy Gradient For Robust Constrained Markov Decision Processes (2023)2.26
- Solving Non-rectangular Reward-robust Mdps Via Frequency Regularization (2023)0.00
- Sample Complexity Of Robust Reinforcement Learning With A Generative Model (2021)0.00
- Policy Learning For Robust Markov Decision Process With A Mismatched Generative Model (2022)0.00
- Provably Efficient Primal-dual Reinforcement Learning For Cmdps With Non-stationary Objectives And Constraints (2022)0.00
- A Safe Exploration Approach To Constrained Markov Decision Processes (2023)0.00