Efficient And Effective Tail Latency Minimization In Multi-stage Retrieval Systems
2017 Β· Joel MacKenzie, J. Shane Culpepper, Roi Blanco, et al.
Abstract
Scalable web search systems typically employ multi-stage retrieval architectures, where an initial stage generates a set of candidate documents that are then pruned and re-ranked. Since subsequent stages typically exploit a multitude of features of varying costs using machine-learned models, reducing the number of documents that are considered at each stage improves latency. In this work, we propose and validate a unified framework that can be used to predict a wide range of performance-sensitive parameters which minimize effectiveness loss, while simultaneously minimizing query latency, across all stages of a multi-stage search architecture. Furthermore, our framework can be easily applied in large-scale IR systems, can be trained without explicitly requiring relevance judgments, and can target a variety of different efficiency-effectiveness trade-offs, making it well suited to a wide range of search scenarios. Our results show that we can reliably predict a number of different parame
Authors
(none)
Tags
Stats
Related papers
- Dynamic Trade-off Prediction In Multi-stage Retrieval Systems (2016)11.93
- Optimizing Retrieval Components For A Shared Backbone Via Component-wise Multi-stage Training (2026)0.00
- Semantic Models For The First-stage Retrieval: A Comprehensive Review (2021)14.27
- Scalingnote: Scaling Up Retrievers With Large Language Models For Real-world Dense Retrieval (2024)0.00
- Towards Efficient And Robust Moment Retrieval System: A Unified Framework For Multi-granularity Models And Temporal Reranking (2025)2.26
- MST-R: Multi-stage Tuning For Retrieval Systems And Metric Evaluation (2024)0.00
- Are We There Yet? A Decision Framework For Replacing Term Based Retrieval With Dense Retrieval Systems (2022)0.00
- Optimizing Compound Retrieval Systems (2025)0.00