Single-trajectory Distributionally Robust Reinforcement Learning
2023 Β· Zhipeng Liang, Xiaoteng Ma, Jose Blanchet, et al.
Abstract
To mitigate the limitation that the classical reinforcement learning (RL) framework heavily relies on identical training and test environments, Distributionally Robust RL (DRRL) has been proposed to enhance performance across a range of environments, possibly including unknown test environments. As a price for robustness gain, DRRL involves optimizing over a set of distributions, which is inherently more challenging than optimizing over a fixed distribution in the non-robust case. Existing DRRL algorithms are either model-based or fail to learn from a single sample trajectory. In this paper, we design a first fully model-free DRRL algorithm, called distributionally robust Q-learning with single trajectory (DRQ). We delicately design a multi-timescale framework to fully utilize each incrementally arriving sample and directly learn the optimal distributionally robust policy without modelling the environment, thus the algorithm can be trained along a single trajectory in a model-free fash
Authors
(none)
Tags
Stats
Related papers
- On The Foundation Of Distributionally Robust Reinforcement Learning (2023)0.00
- Distributionally Robust Self Paced Curriculum Reinforcement Learning (2025)0.00
- Continuous Control Reinforcement Learning: Distributed Distributional Drq Algorithms (2024)0.00
- Improving Robustness Via Risk Averse Distributional Reinforcement Learning (2020)0.00
- The Curious Price Of Distributional Robustness In Reinforcement Learning With A Generative Model (2023)0.00
- Sample-efficient Robust Multi-agent Reinforcement Learning In The Face Of Environmental Uncertainty (2024)0.00
- Group Distributionally Robust Reinforcement Learning With Hierarchical Latent Variables (2022)0.00
- Online Robust Reinforcement Learning With Model Uncertainty (2021)0.00