Robust Domain Randomised Reinforcement Learning Through Peer-to-peer Distillation
2020 Β· Chenyang Zhao, Timothy Hospedales
Abstract
In reinforcement learning, domain randomisation is an increasingly popular technique for learning more general policies that are robust to domain-shifts at deployment. However, naively aggregating information from randomised domains may lead to high variance in gradient estimation and unstable learning process. To address this issue, we present a peer-to-peer online distillation strategy for RL termed P2PDRL, where multiple workers are each assigned to a different environment, and exchange knowledge through mutual regularisation based on Kullback-Leibler divergence. Our experiments on continuous control tasks show that P2PDRL enables robust learning across a wider randomisation distribution than baselines, and more robust generalisation to new environments at testing.
Authors
(none)
Tags
Stats
Related papers
- Periodic Intra-ensemble Knowledge Distillation For Reinforcement Learning (2020)4.52
- Distributionally Robust Self Paced Curriculum Reinforcement Learning (2025)0.00
- Robust Visual Domain Randomization For Reinforcement Learning (2019)0.00
- Fedhpd: Heterogeneous Federated Reinforcement Learning Via Policy Distillation (2025)2.26
- How To Pick The Domain Randomization Parameters For Sim-to-real Transfer Of Reinforcement Learning Policies? (2019)0.00
- Exploration By Random Distribution Distillation (2025)0.00
- Adversary Agnostic Robust Deep Reinforcement Learning (2020)6.77
- Steering Your Diffusion Policy With Latent Space Reinforcement Learning (2025)0.00