Bounded Robustness In Reinforcement Learning Via Lexicographic Objectives
2022 Β· Daniel Jarne Ornia, Licio Romao, Lewis Hammond, et al.
Abstract
Policy robustness in Reinforcement Learning may not be desirable at any cost: the alterations caused by robustness requirements from otherwise optimal policies should be explainable, quantifiable and formally verifiable. In this work we study how policies can be maximally robust to arbitrary observational noise by analysing how they are altered by this noise through a stochastic linear operator interpretation of the disturbances, and establish connections between robustness and properties of the noise kernel and of the underlying MDPs. Then, we construct sufficient conditions for policy robustness, and propose a robustness-inducing scheme, applicable to any policy gradient algorithm, that formally trades off expected policy utility for robustness through lexicographic optimisation, while preserving convergence and sub-optimality in the policy synthesis.
Authors
(none)
Tags
Stats
Related papers
- Lyapunov Robust Constrained-mdps: Soft-constrained Robustly Stable Policy Optimization Under Model Uncertainty (2021)0.00
- Robust Reinforcement Learning: A Case Study In Linear Quadratic Regulation (2020)11.19
- Safe Reinforcement Learning With Dual Robustness (2023)8.60
- Enhancing Robustness In Deep Reinforcement Learning: A Lyapunov Exponent Approach (2024)0.00
- Solving Robust Mdps Through No-regret Dynamics (2023)0.00
- Robust Model-free Reinforcement Learning With Multi-objective Bayesian Optimization (2019)11.08
- Learning Robust Control For LQR Systems With Multiplicative Noise Via Policy Gradient (2019)0.00
- Risk-sensitive Reinforcement Learning With Exponential Criteria (2022)0.00