Periodic Intra-ensemble Knowledge Distillation For Reinforcement Learning
2020 Β· Zhang-Wei Hong, Prabhat Nagarajan, Guilherme Maeda
Abstract
Off-policy ensemble reinforcement learning (RL) methods have demonstrated impressive results across a range of RL benchmark tasks. Recent works suggest that directly imitating experts' policies in a supervised manner before or during the course of training enables faster policy improvement for an RL agent. Motivated by these recent insights, we propose Periodic Intra-Ensemble Knowledge Distillation (PIEKD). PIEKD is a learning framework that uses an ensemble of policies to act in the environment while periodically sharing knowledge amongst policies in the ensemble through knowledge distillation. Our experiments demonstrate that PIEKD improves upon a state-of-the-art RL method in sample efficiency on several challenging MuJoCo benchmark tasks. Additionally, we perform ablation studies to better understand PIEKD.
Authors
(none)
Tags
Stats
Related papers
- Robust Domain Randomised Reinforcement Learning Through Peer-to-peer Distillation (2020)0.00
- Knowru: Knowledge Reusing Via Knowledge Distillation In Multi-agent Reinforcement Learning (2021)9.23
- KD-MARL: Resource-aware Knowledge Distillation In Multi-agent Reinforcement Learning (2026)0.00
- Online Policy Distillation With Decision-attention (2024)0.00
- Towards Applicable Reinforcement Learning: Improving The Generalization And Sample Efficiency With Policy Ensemble (2022)9.23
- Fedhpd: Heterogeneous Federated Reinforcement Learning Via Policy Distillation (2025)2.26
- MEPG: A Minimalist Ensemble Policy Gradient Framework For Deep Reinforcement Learning (2021)0.00
- How Ensembles Of Distilled Policies Improve Generalisation In Reinforcement Learning (2025)0.00