Stuttering Detection Using Speaker Representations And Self-supervised Contextual Embeddings
2023 Β· Shakeel A. Sheikh, Md Sahidullah, Fabrice Hirsch, et al.
Abstract
The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from pre-trained deep learning models trained on large audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation time delay neural network (ECAPA-TDNN) and Wav2Vec2.0 models trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as the K-nearest neighbour (KNN), Gaussian naive Bayes, and neural network, for the SD tasks. In comparison to the standard SD systems trained only on the limited SEP-28k dataset, we obtain a relative improvement of 12.08%, 28.71%, 37.9% in terms of unweighted average recall (UAR) over the baselines. Finally, we have shown that combining two
Authors
(none)
Tags
Stats
Related papers
- End-to-end And Self-supervised Learning For Compare 2022 Stuttering Sub-challenge (2022)2.26
- Adapting Self-supervised Models To Multi-talker Speech Recognition Using Speaker Embeddings (2022)10.61
- Deep Speaker Embeddings For Far-field Speaker Recognition On Short Utterances (2020)11.29
- Unspeech: Unsupervised Speech Context Embeddings (2018)7.50
- The Efficacy Of Self-supervised Speech Models For Audio Representations (2022)0.00
- Large-scale Self-supervised Speech Representation Learning For Automatic Speaker Verification (2021)15.25
- Effect Of Attention And Self-supervised Speech Embeddings On Non-semantic Speech Tasks (2023)4.52
- A Closer Look At Wav2vec2 Embeddings For On-device Single-channel Speech Enhancement (2024)0.00