Optimizing Multi-stuttered Speech Classification: Leveraging Whisper's Encoder For Efficient Parameter Reduction In Automated Assessment
2024 Β· Huma Ameer, Seemab Latif, Mehwish Fatima
Abstract
The automated classification of stuttered speech has significant implications for timely assessments providing assistance to speech language pathologists. Despite notable advancements in the field, the cases in which multiple disfluencies occur in speech require attention. We have taken a progressive approach to fill this gap by classifying multi-stuttered speech more efficiently. The problem has been addressed by firstly curating a dataset of multi-stuttered disfluencies from open source dataset SEP-28k audio clips. Secondly, employing Whisper, a state-of-the-art speech recognition model has been leveraged by using its encoder and taking the problem as multi label classification. Thirdly, using a 6 encoder layer Whisper and experimenting with various layer freezing strategies, a computationally efficient configuration of the model was identified. The proposed configuration achieved micro, macro, and weighted F1-scores of 0.88, 0.85, and 0.87, correspondingly on an external test datase
Authors
(none)
Tags
Stats
Related papers
- Whisper In Focus: Enhancing Stuttered Speech Classification With Encoder Layer Optimization (2023)0.00
- Adapting Whisper For Code-switching Through Encoding Refining And Language-aware Decoding (2024)0.00
- Multilingual Distilwhisper: Efficient Distillation Of Multi-task Speech Models Via Language-specific Experts (2023)8.09
- Simul-whisper: Attention-guided Streaming Whisper With Truncation Detection (2024)6.34
- Dq-whisper: Joint Distillation And Quantization For Efficient Multilingual Speech Recognition (2023)4.52
- Whisper-pmfa: Partial Multi-scale Feature Aggregation For Speaker Verification Using Whisper Models (2024)0.00
- Stutter-solver: End-to-end Multi-lingual Dysfluency Detection (2024)5.24
- Fluentspeech: Stutter-oriented Automatic Speech Editing With Context-aware Diffusion Models (2023)12.13