BAM! Born-again Multi-task Networks For Natural Language Understanding | Awesome LLM Papers

BAM! Born-again Multi-task Networks For Natural Language Understanding

Kevin Clark, Minh-Thang Luong, Urvashi Khandelwal, Christopher D. Manning, Quoc V. Le Β· Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics Β· 2019

It can be challenging to train multi-task neural networks that outperform or even match their single-task counterparts. To help address this, we propose using knowledge distillation where single-task models teach a multi-task model. We enhance this training with teacher annealing, a novel method that gradually transitions the model from distillation to supervised learning, helping the multi-task model surpass its single-task teachers. We evaluate our approach by multi-task fine-tuning BERT on the GLUE benchmark. Our method consistently improves over standard single-task and multi-task training.

Similar Work
Loading…