Behind The Scenes: Mechanistic Interpretability Of Lora-adapted Whisper For Speech Emotion Recognition
2025 Β· Yujian Ma, Xikun Lu, Jinqiu Sang, et al.
Abstract
Large pre-trained speech models such as Whisper offer strong generalization but pose significant challenges for resource-efficient adaptation. Low-Rank Adaptation (LoRA) has become a popular parameter-efficient fine-tuning method, yet its underlying mechanisms in speech tasks remain poorly understood. In this work, we conduct the first systematic mechanistic interpretability study of LoRA within the Whisper encoder for speech emotion recognition (SER). Using a suite of analytical tools, including layer contribution probing, logit-lens inspection, and representational similarity via singular value decomposition (SVD) and centered kernel alignment (CKA), we reveal two key mechanisms: a delayed specialization process that preserves general features in early layers before consolidating task-specific information, and a forward alignment, backward differentiation dynamic between LoRA's matrices. Our findings clarify how LoRA reshapes encoder hierarchies, providing both empirical insights and
Authors
(none)
Tags
Stats
Related papers
- Sparsely Shared Lora On Whisper For Child Speech Recognition (2023)9.59
- Investigating Training Strategies And Model Robustness Of Low-rank Adaptation For Language Modeling In Speech Recognition (2024)0.00
- Whisper-lm: Improving ASR Models With Language Models For Low-resource Languages (2025)3.29
- Dual-pipeline With Low-rank Adaptation For New Language Integration In Multilingual ASR (2024)3.58
- Probing The Hidden Talent Of ASR Foundation Models For L2 English Oral Assessment (2025)0.00
- EELE: Exploring Efficient And Extensible Lora Integration In Emotional Text-to-speech (2024)2.26
- Low-rank Adaptation Of Large Language Model Rescoring For Parameter-efficient Speech Recognition (2023)11.76
- Multilingual Distilwhisper: Efficient Distillation Of Multi-task Speech Models Via Language-specific Experts (2023)8.09