End-to-end And Self-supervised Learning For Compare 2022 Stuttering Sub-challenge
2022 Β· Shakeel Ahmad Sheikh, Md Sahidullah, Fabrice Hirsch, et al.
Abstract
In this paper, we present end-to-end and speech embedding based systems trained in a self-supervised fashion to participate in the ACM Multimedia 2022 ComParE Challenge, specifically the stuttering sub-challenge. In particular, we exploit the embeddings from the pre-trained Wav2Vec2.0 model for stuttering detection (SD) on the KSoF dataset. After embedding extraction, we benchmark with several methods for SD. Our proposed self-supervised based SD system achieves a UAR of 36.9% and 41.0% on validation and test sets respectively, which is 31.32% (validation set) and 1.49% (test set) higher than the best (DeepSpectrum) challenge baseline (CBL). Moreover, we show that concatenating layer embeddings with Mel-frequency cepstral coefficients (MFCCs) features further improves the UAR of 33.81% and 5.45% on validation and test sets respectively over the CBL. Finally, we demonstrate that the summing information across all the layers of Wav2Vec2.0 surpasses the CBL by a relative margin of 45.91%
Authors
(none)
Tags
Stats
Related papers
- Stuttering Detection Using Speaker Representations And Self-supervised Contextual Embeddings (2023)6.34
- Ccc-wav2vec 2.0: Clustering Aided Cross Contrastive Self-supervised Learning Of Speech Representations (2022)7.81
- Comparing Supervised And Self-supervised Embedding For Exvo Multi-task Learning Track (2022)0.00
- A Closer Look At Wav2vec2 Embeddings For On-device Single-channel Speech Enhancement (2024)0.00
- Efficient Speech Quality Assessment Using Self-supervised Framewise Embeddings (2022)5.84
- Exploring Wavlm Back-ends For Speech Spoofing And Deepfake Detection (2024)4.52
- Pretrained Semantic Speech Embeddings For End-to-end Spoken Language Understanding Via Cross-modal Teacher-student Learning (2020)9.92
- Accent-robust Automatic Speech Recognition Using Supervised And Unsupervised Wav2vec Embeddings (2021)0.00