Continual Policy Distillation From Distributed Reinforcement Learning Teachers
2026 Β· Yuxuan Li, Qijun He, Mingqi Yuan, et al.
Abstract
Continual Reinforcement Learning (CRL) aims to develop lifelong learning agents to continuously acquire knowledge across diverse tasks while mitigating catastrophic forgetting. This requires efficiently managing the stability-plasticity dilemma and leveraging prior experience to rapidly generalize to novel tasks. While various enhancement strategies for both aspects have been proposed, achieving scalable performance by directly applying RL to sequential task streams remains challenging. In this paper, we propose a novel teacher-student framework that decouples CRL into two independent processes: training single-task teacher models through distributed RL and continually distilling them into a central generalist model. This design is motivated by the observation that RL excels at solving single tasks, while policy distillation -- a relatively stable supervised learning process -- is well aligned with large foundation models and multi-task learning. Moreover, a mixture-of-experts (MoE) ar
Authors
(none)
Tags
Stats
Related papers
- Continual Deep Reinforcement Learning With Task-agnostic Policy Distillation (2024)0.00
- Multi-granularity Knowledge Transfer For Continual Reinforcement Learning (2024)2.26
- Task-agnostic Continual Reinforcement Learning: Gaining Insights And Overcoming Challenges (2022)0.00
- Demonstration-guided Continual Reinforcement Learning In Dynamic Environments (2025)0.00
- Continual Reinforcement Learning By Planning With Online World Models (2025)0.00
- Same State, Different Task: Continual Reinforcement Learning Without Interference (2021)0.00
- Dynamics-adaptive Continual Reinforcement Learning Via Progressive Contextualization (2022)7.16
- Reset & Distill: A Recipe For Overcoming Negative Transfer In Continual Reinforcement Learning (2024)0.00