Learning Acoustic Word Embeddings With Temporal Context For Query-by-example Speech Search
2018 Β· Yougen Yuan, Cheung-Chi Leung, Lei Xie, et al.
Abstract
We propose to learn acoustic word embeddings with temporal context for query-by-example (QbE) speech search. The temporal context includes the leading and trailing word sequences of a word. We assume that there exist spoken word pairs in the training database. We pad the word pairs with their original temporal context to form fixed-length speech segment pairs. We obtain the acoustic word embeddings through a deep convolutional neural network (CNN) which is trained on the speech segment pairs with a triplet loss. Shifting a fixed-length analysis window through the search content, we obtain a running sequence of embeddings. In this way, searching for the spoken query is equivalent to the matching of acoustic word embeddings. The experiments show that our proposed acoustic word embeddings learned with temporal context are effective in QbE speech search. They outperform the state-of-the-art frame-level feature representations and reduce run-time computation since no dynamic time warping is
Authors
(none)
Tags
Stats
Related papers
- Query-by-example Search With Discriminative Neural Acoustic Word Embeddings (2017)12.40
- Semantic Query-by-example Speech Search Using Visual Grounding (2019)7.81
- Acoustic Word Embedding System For Code-switching Query-by-example Spoken Term Detection (2020)3.58
- Neural Network Based End-to-end Query By Example Spoken Term Detection (2019)0.00
- Query-by-example Keyword Spotting Using Spectral-temporal Graph Attentive Pooling And Multi-task Learning (2024)0.00
- Improving Query-by-vocal Imitation With Contrastive Learning And Audio Pretraining (2024)0.00
- Learning Word Embeddings From Speech (2017)0.00
- Multilingual Bottleneck Features For Query By Example Spoken Term Detection (2019)9.23