Multi-level Knowledge Distillation For Speech Emotion Recognition In Noisy Conditions
2023 Β· Yang Liu, Haoqin Sun, Geng Chen, et al.
Abstract
Speech emotion recognition (SER) performance deteriorates significantly in the presence of noise, making it challenging to achieve competitive performance in noisy conditions. To this end, we propose a multi-level knowledge distillation (MLKD) method, which aims to transfer the knowledge from a teacher model trained on clean speech to a simpler student model trained on noisy speech. Specifically, we use clean speech features extracted by the wav2vec-2.0 as the learning goal and train the distil wav2vec-2.0 to approximate the feature extraction ability of the original wav2vec-2.0 under noisy conditions. Furthermore, we leverage the multi-level knowledge of the original wav2vec-2.0 to supervise the single-level output of the distil wav2vec-2.0. We evaluate the effectiveness of our proposed method by conducting extensive experiments using five types of noise-contaminated speech on the IEMOCAP dataset, which show promising results compared to state-of-the-art models.
Authors
(none)
Tags
Stats
Related papers
- Multi-teacher Language-aware Knowledge Distillation For Multilingual Speech Emotion Recognition (2025)0.00
- Hierarchical Network With Decoupled Knowledge Distillation For Speech Emotion Recognition (2023)6.77
- Integrated Multi-level Knowledge Distillation For Enhanced Speaker Verification (2024)0.00
- Speech Emotion Recognition With Distilled Prosodic And Linguistic Affect Representations (2023)5.24
- Two-stage Framework For Robust Speech Emotion Recognition Using Target Speaker Extraction In Human Speech Noise Conditions (2024)3.58
- Dual-branch Knowledge Distillation For Noise-robust Synthetic Speech Detection (2023)9.07
- Emphasized Non-target Speaker Knowledge In Knowledge Distillation For Automatic Speaker Verification (2023)8.35
- An Efficient End-to-end Approach To Noise Invariant Speech Features Via Multi-task Learning (2024)0.00