Robust Lagrangian And Adversarial Policy Gradient For Robust Constrained Markov Decision Processes
2023 Β· David M. Bossens
Abstract
The robust constrained Markov decision process (RCMDP) is a recent task-modelling framework for reinforcement learning that incorporates behavioural constraints and that provides robustness to errors in the transition dynamics model through the use of an uncertainty set. Simulating RCMDPs requires computing the worst-case dynamics based on value estimates for each state, an approach which has previously been used in the Robust Constrained Policy Gradient (RCPG). Highlighting potential downsides of RCPG such as not robustifying the full constrained objective and the lack of incremental learning, this paper introduces two algorithms, called RCPG with Robust Lagrangian and Adversarial RCPG. RCPG with Robust Lagrangian modifies RCPG by taking the worst-case dynamics based on the Lagrangian rather than either the value or the constraint. Adversarial RCPG also formulates the worst-case dynamics based on the Lagrangian but learns this directly and incrementally as an adversarial policy throug
Authors
(none)
Tags
Stats
Related papers
- Lyapunov Robust Constrained-mdps: Soft-constrained Robustly Stable Policy Optimization Under Model Uncertainty (2021)0.00
- Robust Reinforcement Learning Using Least Squares Policy Iteration With Provable Performance Guarantees (2020)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Robust Model-based Reinforcement Learning With An Adversarial Auxiliary Model (2024)0.00
- Policy Learning For Robust Markov Decision Process With A Mismatched Generative Model (2022)0.00
- Robust Risk-sensitive Reinforcement Learning With Conditional Value-at-risk (2024)5.84
- Sample Complexity Of Robust Reinforcement Learning With A Generative Model (2021)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00