Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study
2021 Β· Badr M. Abdullah, Marius Mosbach, Iuliia Zaitova, et al.
Abstract
Several variants of deep neural networks have been successfully employed for building parametric models that project variable-duration spoken word segments onto fixed-size vector representations, or acoustic word embeddings (AWEs). However, it remains unclear to what degree we can rely on the distance in the emerging AWE space as an estimate of word-form similarity. In this paper, we ask: does the distance in the acoustic embedding space correlate with phonological dissimilarity? To answer this question, we empirically investigate the performance of supervised approaches for AWEs with different neural architectures and learning objectives. We train AWE models in controlled settings for two languages (German and Czech) and evaluate the embeddings on two tasks: word discrimination and phonological similarity. Our experiments show that (1) the distance in the embedding space in the best cases only moderately correlates with phonological distance, and (2) improving the performance on the w
Authors
(none)
Tags
Stats
Related papers
- How Familiar Does That Sound? Cross-lingual Representational Similarity Analysis Of Acoustic Word Embeddings (2021)0.00
- Leveraging Multilingual Transfer For Unsupervised Semantic Acoustic Word Embeddings (2023)3.58
- Improving Acoustic Word Embeddings Through Correspondence Training Of Self-supervised Speech Representations (2024)0.00
- Layer-wise Analysis Of Self-supervised Acoustic Word Embeddings: A Study On Speech Emotion Recognition (2024)0.00
- Supervised Acoustic Embeddings And Their Transferability Across Languages (2023)0.00
- A Comparison Of Self-supervised Speech Representations As Input Features For Unsupervised Acoustic Word Embeddings (2020)7.16
- Discriminative Acoustic Word Embeddings: Recurrent Neural Network-based Approaches (2016)0.00
- Asymmetric Proxy Loss For Multi-view Acoustic Word Embeddings (2022)2.26