Multilingual Bottleneck Features For Query By Example Spoken Term Detection
2019 Β· Dhananjay Ram, Lesly Miculicich, HervΓ© Bourlard
Abstract
State of the art solutions to query by example spoken term detection (QbE-STD) usually rely on bottleneck feature representation of the query and audio document to perform dynamic time warping (DTW) based template matching. Here, we present a study on QbE-STD performance using several monolingual as well as multilingual bottleneck features extracted from feed forward networks. Then, we propose to employ residual networks (ResNet) to estimate the bottleneck features and show significant improvements over the corresponding feed forward network based features. The neural networks are trained on GlobalPhone corpus and QbE-STD experiments are performed on a very challenging QUESST 2014 database.
Authors
(none)
Tags
Stats
Related papers
- Neural Network Based End-to-end Query By Example Spoken Term Detection (2019)0.00
- Cross-lingual Query-by-example Spoken Term Detection: A Transformer-based Approach (2024)0.00
- Query-by-example Spoken Term Detection Using Attention-based Multi-hop Networks (2017)9.23
- A Nonparametric Bayesian Approach For Spoken Term Detection By Example Query (2016)0.00
- Query-by-example Keyword Spotting Using Spectral-temporal Graph Attentive Pooling And Multi-task Learning (2024)0.00
- Time-contrastive Learning Based Deep Bottleneck Features For Text-dependent Speaker Verification (2019)9.92
- Learning Acoustic Word Embeddings With Temporal Context For Query-by-example Speech Search (2018)9.92
- Query-by-example Search With Discriminative Neural Acoustic Word Embeddings (2017)12.40