Asymmetric Proxy Loss For Multi-view Acoustic Word Embeddings
2022 Β· Myunghun Jung, Hoirin Kim
Abstract
Acoustic word embeddings (AWEs) are discriminative representations of speech segments, and learned embedding space reflects the phonetic similarity between words. With multi-view learning, where text labels are considered as supplementary input, AWEs are jointly trained with acoustically grounded word embeddings (AGWEs). In this paper, we expand the multi-view approach into a proxy-based framework for deep metric learning by equating AGWEs with proxies. A simple modification in computing the similarity matrix allows the general pair weighting to formulate the data-to-proxy relationship. Under the systematized framework, we propose an asymmetric-proxy loss that combines different parts of loss functions asymmetrically while keeping their merits. It follows the assumptions that the optimal function for anchor-positive pairs may differ from one for anchor-negative pairs, and a proxy may have a different impact when it substitutes for different positions in the triplet. We present comparat
Authors
(none)
Tags
Stats
Related papers
- Leveraging Multilingual Transfer For Unsupervised Semantic Acoustic Word Embeddings (2023)3.58
- Relational Proxy Loss For Audio-text Based Keyword Spotting (2024)3.58
- Additional Shared Decoder On Siamese Multi-view Encoders For Learning Acoustic Word Embeddings (2019)6.34
- Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study (2021)4.52
- Masked Proxy Loss For Text-independent Speaker Verification (2020)2.26
- Improvements To Embedding-matching Acoustic-to-word ASR Using Multiple-hypothesis Pronunciation-based Embeddings (2022)0.00
- Improving Acoustic Word Embeddings Through Correspondence Training Of Self-supervised Speech Representations (2024)0.00
- Layer-wise Analysis Of Self-supervised Acoustic Word Embeddings: A Study On Speech Emotion Recognition (2024)0.00