Neural Network Based End-to-end Query By Example Spoken Term Detection
2019 Β· Dhananjay Ram, Lesly Miculicich, HervΓ© Bourlard
Abstract
This paper focuses on the problem of query by example spoken term detection (QbE-STD) in zero-resource scenario. State-of-the-art approaches primarily rely on dynamic time warping (DTW) based template matching techniques using phone posterior or bottleneck features extracted from a deep neural network (DNN). We use both monolingual and multilingual bottleneck features, and show that multilingual features perform increasingly better with more training languages. Previously, it has been shown that the DTW based matching can be replaced with a CNN based matching while using posterior features. Here, we show that the CNN based matching outperforms DTW based matching using bottleneck features as well. In this case, the feature extraction and pattern matching stages of our QbE-STD system are optimized independently of each other. We propose to integrate these two stages in a fully neural network based end-to-end learning framework to enable joint optimization of those two stages simultaneous
Authors
(none)
Tags
Stats
Related papers
- Multilingual Bottleneck Features For Query By Example Spoken Term Detection (2019)9.23
- Cross-lingual Query-by-example Spoken Term Detection: A Transformer-based Approach (2024)0.00
- Query-by-example Spoken Term Detection Using Attention-based Multi-hop Networks (2017)9.23
- A Nonparametric Bayesian Approach For Spoken Term Detection By Example Query (2016)0.00
- Query-by-example Search With Discriminative Neural Acoustic Word Embeddings (2017)12.40
- Learning Acoustic Word Embeddings With Temporal Context For Query-by-example Speech Search (2018)9.92
- Query-by-example Keyword Spotting Using Spectral-temporal Graph Attentive Pooling And Multi-task Learning (2024)0.00
- Semantic Query-by-example Speech Search Using Visual Grounding (2019)7.81