Group Distributionally Robust Reinforcement Learning With Hierarchical Latent Variables
2022 Β· Mengdi Xu, Peide Huang, Yaru Niu, et al.
Abstract
One key challenge for multi-task Reinforcement learning (RL) in practice is the absence of task indicators. Robust RL has been applied to deal with task ambiguity, but may result in over-conservative policies. To balance the worst-case (robustness) and average performance, we propose Group Distributionally Robust Markov Decision Process (GDR-MDP), a flexible hierarchical MDP formulation that encodes task groups via a latent mixture model. GDR-MDP identifies the optimal policy that maximizes the expected return under the worst-possible qualified belief over task groups within an ambiguity set. We rigorously show that GDR-MDP's hierarchical structure improves distributional robustness by adding regularization to the worst possible outcomes. We then develop deep RL algorithms for GDR-MDP for both value-based and policy-based RL methods. Extensive experiments on Box2D control tasks, MuJoCo benchmarks, and Google football platforms show that our algorithms outperform classic robust training
Authors
(none)
Tags
Stats
Related papers
- On The Foundation Of Distributionally Robust Reinforcement Learning (2023)0.00
- The Curious Price Of Distributional Robustness In Reinforcement Learning With A Generative Model (2023)0.00
- Single-trajectory Distributionally Robust Reinforcement Learning (2023)0.00
- Distributionally Robust Model-based Reinforcement Learning With Large State Spaces (2023)0.00
- Sample-efficient Robust Multi-agent Reinforcement Learning In The Face Of Environmental Uncertainty (2024)0.00
- Noise Distribution Decomposition Based Multi-agent Distributional Reinforcement Learning (2023)0.00
- Deep Hierarchical Reinforcement Learning Algorithm In Partially Observable Markov Decision Processes (2018)12.87
- Distributional Reinforcement Learning For Multi-dimensional Reward Functions (2021)0.00