Cross-modal Distillation For Widely Differing Modalities
2025 Β· Cairong Zhao, Yufeng Jin, Zifan Song, et al.
Abstract
Deep learning achieved great progress recently, however, it is not easy or efficient to further improve its performance by increasing the size of the model. Multi-modal learning can mitigate this challenge by introducing richer and more discriminative information as input. To solve the problem of limited access to multi-modal data at the time of use, we conduct multi-modal learning by introducing a teacher model to transfer discriminative knowledge to a student model during training. However, this knowledge transfer via distillation is not trivial because the big domain gap between the widely differing modalities can easily lead to overfitting. In this work, we introduce a cross-modal distillation framework. Specifically, we find hard constrained loss, e.g. l2 loss forcing the student being exact the same as the teacher, can easily lead to overfitting in cross-modality distillation. To address this, we propose two soft constrained knowledge distillation strategies at the feature level
Authors
(none)
Tags
Stats
Related papers
- Knowledge Distillation From Language Model To Acoustic Model: A Hierarchical Multi-task Learning Approach (2021)3.58
- Modalitymirror: Improving Audio Classification In Modality Heterogeneity Federated Learning With Multimodal Distillation (2024)2.26
- Audio Representation Learning By Distilling Video As Privileged Information (2023)0.00
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00
- An Efficient End-to-end Approach To Noise Invariant Speech Features Via Multi-task Learning (2024)0.00
- Mutual Learning Of Single- And Multi-channel End-to-end Neural Diarization (2022)3.58
- Distilling Knowledge From Ensembles Of Acoustic Models For Joint Ctc-attention End-to-end Speech Recognition (2020)8.09
- Distil-dccrn: A Small-footprint DCCRN Leveraging Feature-based Knowledge Distillation In Speech Enhancement (2024)2.26