Bring Your Own (non-robust) Algorithm To Solve Robust Mdps By Estimating The Worst Kernel
2023 Β· Kaixin Wang, Uri Gadot, Navdeep Kumar, et al.
Abstract
Robust Markov Decision Processes (RMDPs) provide a framework for sequential decision-making that is robust to perturbations on the transition kernel. However, current RMDP methods are often limited to small-scale problems, hindering their use in high-dimensional domains. To bridge this gap, we present EWoK, a novel online approach to solve RMDP that Estimates the Worst transition Kernel to learn robust policies. Unlike previous works that regularize the policy or value updates, EWoK achieves robustness by simulating the worst scenarios for the agent while retaining complete flexibility in the learning process. Notably, EWoK can be applied on top of any off-the-shelf \{\em non-robust\} RL algorithm, enabling easy scaling to high-dimensional domains. Our experiments, spanning from simple Cartpole to high-dimensional DeepMind Control Suite environments, demonstrate the effectiveness and applicability of the EWoK paradigm as a practical method for learning robust policies.
Authors
(none)
Tags
Stats
Related papers
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- Deep Robust Kalman Filter (2017)0.00
- Online MDP With Transition Prototypes: A Robust Adaptive Approach (2024)0.00
- Sample Complexity Of Robust Reinforcement Learning With A Generative Model (2021)0.00
- Efficient Policy Optimization In Robust Constrained Mdps With Iteration Complexity Guarantees (2025)0.00
- Policy Learning For Robust Markov Decision Process With A Mismatched Generative Model (2022)0.00
- Robust Lagrangian And Adversarial Policy Gradient For Robust Constrained Markov Decision Processes (2023)2.26
- A Bayesian Approach To Robust Reinforcement Learning (2019)0.00