Continual Deep Reinforcement Learning With Task-agnostic Policy Distillation
2024 Β· Muhammad Burhan Hafez, Kerim Erekmen
Abstract
Central to the development of universal learning systems is the ability to solve multiple tasks without retraining from scratch when new data arrives. This is crucial because each task requires significant training time. Addressing the problem of continual learning necessitates various methods due to the complexity of the problem space. This problem space includes: (1) addressing catastrophic forgetting to retain previously learned tasks, (2) demonstrating positive forward transfer for faster learning, (3) ensuring scalability across numerous tasks, and (4) facilitating learning without requiring task labels, even in the absence of clear task boundaries. In this paper, the Task-Agnostic Policy Distillation (TAPD) framework is introduced. This framework alleviates problems (1)-(4) by incorporating a task-agnostic phase, where an agent explores its environment without any external goal and maximizes only its intrinsic motivation. The knowledge gained during this phase is later distilled
Authors
(none)
Tags
Stats
Related papers
- Continual Policy Distillation From Distributed Reinforcement Learning Teachers (2026)0.00
- Deep Decentralized Multi-task Multi-agent Reinforcement Learning Under Partial Observability (2017)0.00
- Dual Policy Distillation (2020)10.61
- Task-agnostic Continual Reinforcement Learning: Gaining Insights And Overcoming Challenges (2022)0.00
- Continual Auxiliary Task Learning (2022)0.00
- A New Framework For Multi-agent Reinforcement Learning -- Centralized Training And Exploration With Decentralized Execution Via Policy Distillation (2019)0.00
- Efficient Open-world Reinforcement Learning Via Knowledge Distillation And Autonomous Rule Discovery (2023)0.00
- Online Policy Distillation With Decision-attention (2024)0.00