Query-by-example Spoken Term Detection Using Attention-based Multi-hop Networks
2017 Β· Chia-Wei Ao, Hung-Yi Lee
Abstract
Retrieving spoken content with spoken queries, or query-by- example spoken term detection (STD), is attractive because it makes possible the matching of signals directly on the acoustic level without transcribing them into text. Here, we propose an end-to-end query-by-example STD model based on an attention-based multi-hop network, whose input is a spoken query and an audio segment containing several utterances; the output states whether the audio segment includes the query. The model can be trained in either a supervised scenario using labeled data, or in an unsupervised fashion. In the supervised scenario, we find that the attention mechanism and multiple hops improve performance, and that the attention weights indicate the time span of the detected terms. In the unsupervised setting, the model mimics the behavior of the existing query-by-example STD system, yielding performance comparable to the existing system but with a lower search time complexity.
Authors
(none)
Tags
Stats
Related papers
- Neural Network Based End-to-end Query By Example Spoken Term Detection (2019)0.00
- Cross-lingual Query-by-example Spoken Term Detection: A Transformer-based Approach (2024)0.00
- A Nonparametric Bayesian Approach For Spoken Term Detection By Example Query (2016)0.00
- Multilingual Bottleneck Features For Query By Example Spoken Term Detection (2019)9.23
- Query-by-example Search With Discriminative Neural Acoustic Word Embeddings (2017)12.40
- Semantic Query-by-example Speech Search Using Visual Grounding (2019)7.81
- BEST-STD2.0: Balanced And Efficient Speech Tokenizer For Spoken Term Detection (2025)0.00
- Query-by-example Keyword Spotting System Using Multi-head Attention And Softtriple Loss (2021)11.39