Speaker Retrieval In The Wild: Challenges, Effectiveness And Robustness
2025 Β· Erfan Loweimi, Mengjie Qian, Kate Knill, et al.
Abstract
There is a growing abundance of publicly available or company-owned audio/video archives, highlighting the increasing importance of efficient access to desired content and information retrieval from these archives. This paper investigates the challenges, solutions, effectiveness, and robustness of speaker retrieval systems developed "in the wild" which involves addressing two primary challenges: extraction of task-relevant labels from limited metadata for system development and evaluation, as well as the unconstrained acoustic conditions encountered in the archive, ranging from quiet studios to adverse noisy environments. While we focus on the publicly-available BBC Rewind archive (spanning 1948 to 1979), our framework addresses the broader issue of speaker retrieval on extensive and possibly aged archives with no control over the content and acoustic conditions. Typically, these archives offer a brief and general file description, mostly inadequate for specific applications like speak
Authors
(none)
Tags
Stats
Related papers
- Audio Retrieval With Natural Language Queries: A Benchmark Study (2021)16.29
- Data Leakage In Cross-modal Retrieval Training: A Case Study (2023)5.84
- Talk, Don't Write: A Study Of Direct Speech-based Image Retrieval (2021)6.77
- Large-scale Speaker Retrieval On Random Speaker Variability Subspace (2018)5.24
- Improving Natural-language-based Audio Retrieval With Transfer Learning And Audio & Text Augmentations (2022)0.00
- Advancing Natural-language Based Audio Retrieval With Passt And Large Audio-caption Data Sets (2023)0.00
- Matching Text And Audio Embeddings: Exploring Transfer-learning Strategies For Language-based Audio Retrieval (2022)0.00
- Voxrag: A Step Toward Transcription-free RAG Systems In Spoken Question Answering (2025)0.00