Unsupervised Speech Intelligibility Assessment With Utterance Level Alignment Distance Between Teacher And Learner Wav2vec-2.0 Representations
2023 Β· Nayan Anand, Meenakshi Sirigiraju, Chiranjeevi Yarra
Abstract
Speech intelligibility is crucial in language learning for effective communication. Thus, to develop computer-assisted language learning systems, automatic speech intelligibility detection (SID) is necessary. Most of the works have assessed the intelligibility in a supervised manner considering manual annotations, which requires cost and time; hence scalability is limited. To overcome these, this work proposes an unsupervised approach for SID. The proposed approach considers alignment distance computed with dynamic-time warping (DTW) between teacher and learner representation sequence as a measure to separate intelligible versus non-intelligible speech. We obtain the feature sequence using current state-of-the-art self-supervised representations from Wav2Vec-2.0. We found the detection accuracies as 90.37%, 92.57% and 96.58%, respectively, with three alignment distance measures -- mean absolute error, mean squared error and cosine distance (equal to one minus cosine similarity).
Authors
(none)
Tags
Stats
Related papers
- Exploring Wav2vec 2.0 On Speaker Verification And Language Identification (2020)15.59
- Unsupervised Speech Recognition (2021)0.00
- Wav2vec: Unsupervised Pre-training For Speech Recognition (2019)0.00
- Data2vec-aqc: Search For The Right Teaching Assistant In The Teacher-student Training Setup (2022)5.87
- A Noise-robust Self-supervised Pre-training Model Based Speech Representation Learning For Automatic Speech Recognition (2022)11.19
- Accent-robust Automatic Speech Recognition Using Supervised And Unsupervised Wav2vec Embeddings (2021)0.00
- Vq-wav2vec: Self-supervised Learning Of Discrete Speech Representations (2019)0.00
- Ccc-wav2vec 2.0: Clustering Aided Cross Contrastive Self-supervised Learning Of Speech Representations (2022)7.81