Online Policy Distillation With Decision-attention
2024 Β· Xinqiang Yu, Chuanguang Yang, Chengqing Yu, et al.
Abstract
Policy Distillation (PD) has become an effective method to improve deep reinforcement learning tasks. The core idea of PD is to distill policy knowledge from a teacher agent to a student agent. However, the teacher-student framework requires a well-trained teacher model which is computationally expensive.In the light of online knowledge distillation, we study the knowledge transfer between different policies that can learn diverse knowledge from the same environment.In this work, we propose Online Policy Distillation (OPD) with Decision-Attention (DA), an online learning framework in which different policies operate in the same environment to learn different perspectives of the environment and transfer knowledge to each other to obtain better performance together. With the absence of a well-performance teacher policy, the group-derived targets play a key role in transferring group knowledge to each student policy. However, naive aggregation functions tend to cause student policies quic
Authors
(none)
Tags
Stats
Related papers
- Dual Policy Distillation (2020)10.61
- Offline Behavior Distillation (2024)0.00
- Periodic Intra-ensemble Knowledge Distillation For Reinforcement Learning (2020)4.52
- TCOD: Exploring Temporal Curriculum In On-policy Distillation For Multi-turn Autonomous Agents (2026)0.00
- Fedhpd: Heterogeneous Federated Reinforcement Learning Via Policy Distillation (2025)2.26
- KD-MARL: Resource-aware Knowledge Distillation In Multi-agent Reinforcement Learning (2026)0.00
- Continual Deep Reinforcement Learning With Task-agnostic Policy Distillation (2024)0.00
- How Ensembles Of Distilled Policies Improve Generalisation In Reinforcement Learning (2025)0.00