Whisper In Focus: Enhancing Stuttered Speech Classification With Encoder Layer Optimization
2023 Β· Huma Ameer, Seemab Latif, Rabia Latif, et al.
Abstract
In recent years, advancements in the field of speech processing have led to cutting-edge deep learning algorithms with immense potential for real-world applications. The automated identification of stuttered speech is one of such applications that the researchers are addressing by employing deep learning techniques. Recently, researchers have utilized Wav2vec2.0, a speech recognition model to classify disfluency types in stuttered speech. Although Wav2vec2.0 has shown commendable results, its ability to generalize across all disfluency types is limited. In addition, since its base model uses 12 encoder layers, it is considered a resource-intensive model. Our study unravels the capabilities of Whisper for the classification of disfluency types in stuttered speech. We have made notable contributions in three pivotal areas: enhancing the quality of SEP28-k benchmark dataset, exploration of Whisper for classification, and introducing an efficient encoder layer freezing strategy. The optimi
Authors
(none)
Tags
Stats
Related papers
- Optimizing Multi-stuttered Speech Classification: Leveraging Whisper's Encoder For Efficient Parameter Reduction In Automated Assessment (2024)0.00
- Whispervc: Decoupled Cross-domain Alignment And Speech Generation For Low-resource Whisper-to-normal Conversion (2025)0.00
- Adapting Whisper For Code-switching Through Encoding Refining And Language-aware Decoding (2024)0.00
- Simul-whisper: Attention-guided Streaming Whisper With Truncation Detection (2024)6.34
- Speaking Clearly: A Simplified Whisper-based Codec For Low-bitrate Speech Coding (2025)3.16
- Whisper Speaker Identification: Leveraging Pre-trained Multilingual Transformers For Robust Speaker Embeddings (2025)0.00
- Whisper Turns Stronger: Augmenting Wav2vec 2.0 For Superior ASR In Low-resource Languages (2024)0.00
- Multilingual Distilwhisper: Efficient Distillation Of Multi-task Speech Models Via Language-specific Experts (2023)8.09