Improving Acoustic Word Embeddings Through Correspondence Training Of Self-supervised Speech Representations
2024 Β· Amit Meghanani, Thomas Hain
Abstract
Acoustic word embeddings (AWEs) are vector representations of spoken words. An effective method for obtaining AWEs is the Correspondence Auto-Encoder (CAE). In the past, the CAE method has been associated with traditional MFCC features. Representations obtained from self-supervised learning (SSL)-based speech models such as HuBERT, Wav2vec2, etc., are outperforming MFCC in many downstream tasks. However, they have not been well studied in the context of learning AWEs. This work explores the effectiveness of CAE with SSL-based speech representations to obtain improved AWEs. Additionally, the capabilities of SSL-based speech models are explored in cross-lingual scenarios for obtaining AWEs. Experiments are conducted on five languages: Polish, Portuguese, Spanish, French, and English. HuBERT-based CAE model achieves the best results for word discrimination in all languages, despite Hu-BERT being pre-trained on English only. Also, the HuBERT-based CAE model works well in cross-lingual sett
Authors
(none)
Tags
Stats
Related papers
- A Comparison Of Self-supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings (2020)7.16
- Analyzing Acoustic Word Embeddings From Pre-trained Self-supervised Speech Models (2022)9.03
- Layer-wise Analysis Of Self-supervised Acoustic Word Embeddings: A Study On Speech Emotion Recognition (2024)0.00
- Supervised Acoustic Embeddings And Their Transferability Across Languages (2023)0.00
- Leveraging Multilingual Transfer For Unsupervised Semantic Acoustic Word Embeddings (2023)3.58
- Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study (2021)4.52
- Efficient Infusion Of Self-supervised Representations In Automatic Speech Recognition (2024)0.00
- Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints In Encoder-decoder Models (2018)0.00