Supervised Acoustic Embeddings And Their Transferability Across Languages
2023 Β· Sreepratha Ram, Hanan Aldarmaki
Abstract
In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings. Self-supervised pre-training has been proposed as a way to improve both supervised and unsupervised speech recognition, including frame-level feature representations and Acoustic Word Embeddings (AWE) for variable-length segments. However, self-supervised models alone cannot learn perfect separation of the linguistic content as they are trained to optimize indirect objectives. In this work, we experiment with different pre-trained self-supervised features as input to AWE models and show that they work best within a supervised framework. Models trained on English can be transferred to other languages with no adaptation and outperform self-supervised models trained solely on the target languages.
Authors
(none)
Tags
Stats
Related papers
- Leveraging Multilingual Transfer For Unsupervised Semantic Acoustic Word Embeddings (2023)3.58
- Analyzing Acoustic Word Embeddings From Pre-trained Self-supervised Speech Models (2022)9.03
- A Comparison Of Self-supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings (2020)7.16
- Layer-wise Analysis Of Self-supervised Acoustic Word Embeddings: A Study On Speech Emotion Recognition (2024)0.00
- Improving Acoustic Word Embeddings Through Correspondence Training Of Self-supervised Speech Representations (2024)0.00
- Improved Acoustic Word Embeddings For Zero-resource Languages Using Multilingual Transfer (2020)7.81
- Improved Language Identification Through Cross-lingual Self-supervised Learning (2021)10.61
- On The Transferability Of Large-scale Self-supervision To Few-shot Audio Classification (2024)3.58