Improving Noise Robustness For Spoken Content Retrieval Using Semi-supervised ASR And N-best Transcripts For Bert-based Ranking Models
2023 Β· Yasufumi Moriya, Gareth. J. F. Jones
Abstract
BERT-based re-ranking and dense retrieval (DR) systems have been shown to improve search effectiveness for spoken content retrieval (SCR). However, both methods can still show a reduction in effectiveness when using ASR transcripts in comparison to accurate manual transcripts. We find that a known-item search task on the How2 dataset of spoken instruction videos shows a reduction in mean reciprocal rank (MRR) scores of 10-14%. As a potential method to reduce this disparity, we investigate the use of semi-supervised ASR transcripts and N-best ASR transcripts to mitigate ASR errors for spoken search using BERT-based ranking. Semi-supervised ASR transcripts brought 2-5.5% MRR improvements over standard ASR transcripts and our N-best early fusion methods for BERT DR systems improved MRR by 3-4%. Combining semi-supervised transcripts with N-best early fusion for BERT DR reduced the MRR gap in search effectiveness between manual and ASR transcripts by more than 50% from 14.32% to 6.58%.
Authors
(none)
Tags
Stats
Related papers
- SDR: Efficient Neural Re-ranking Using Succinct Document Representation (2021)3.58
- Approximate Nearest Neighbor Negative Contrastive Learning For Dense Text Retrieval (2020)0.00
- Self-supervised Contrastive BERT Fine-tuning For Fusion-based Reviewed-item Retrieval (2023)5.84
- DS@GT At TREC TOT 2025: Bridging Vague Recollection With Fusion Retrieval And Learned Reranking (2026)0.00
- Enhancing The Ranking Context Of Dense Retrieval Methods Through Reciprocal Nearest Neighbors (2023)4.52
- On Approximate Nearest Neighbour Selection For Multi-stage Dense Retrieval (2021)8.35
- Injecting The BM25 Score As Text Improves Bert-based Re-rankers (2023)10.48
- Towards Robust Ranker For Text Retrieval (2022)5.84