Additional Shared Decoder On Siamese Multi-view Encoders For Learning Acoustic Word Embeddings
2019 Β· Myunghun Jung, Hyungjun Lim, Jahyun Goo, et al.
Abstract
Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection. Recently, on the fact that the orthography of text labels partly reflects the phonetic similarity between the words' pronunciation, a multi-view approach has been introduced that jointly learns acoustic and text embeddings. It showed that it is possible to learn discriminative embeddings by designing the objective which takes text labels as well as word segments. In this paper, we propose a network architecture that expands the multi-view approach by combining the Siamese multi-view encoders with a shared decoder network to maximize the effect of the relationship between acoustic and text embeddings in embedding space. Discriminatively trained with multi-view triplet loss and decoding loss, our proposed approach achieves better performance on acoustic word discrimination task with the WSJ dataset, resulting in
Authors
(none)
Tags
Stats
Related papers
- Discriminative Acoustic Word Embeddings: Recurrent Neural Network-based Approaches (2016)0.00
- Improved Audio Embeddings By Adjacency-based Clustering With Applications In Spoken Term Detection (2018)0.00
- Asymmetric Proxy Loss For Multi-view Acoustic Word Embeddings (2022)2.26
- Improved Acoustic Word Embeddings For Zero-resource Languages Using Multilingual Transfer (2020)7.81
- Acoustic Word Embedding System For Code-switching Query-by-example Spoken Term Detection (2020)3.58
- Truly Unsupervised Acoustic Word Embeddings Using Weak Top-down Constraints In Encoder-decoder Models (2018)0.00
- Acoustic Neighbor Embeddings (2020)0.00
- Using Multi-task Learning To Improve The Performance Of Acoustic-to-word And Conventional Hybrid Models (2019)0.00