Shallow Cross-encoders For Low-latency Retrieval
2024 Β· Aleksandr V. Petrov, Sean MacAvaney, Craig MacDonald
Abstract
Transformer-based Cross-Encoders achieve state-of-the-art effectiveness in text retrieval. However, Cross-Encoders based on large transformer models (such as BERT or T5) are computationally expensive and allow for scoring only a small number of documents within a reasonably small latency window. However, keeping search latencies low is important for user satisfaction and energy usage. In this paper, we show that weaker shallow transformer models (i.e., transformers with a limited number of layers) actually perform better than full-scale models when constrained to these practical low-latency settings since they can estimate the relevance of more documents in the same time budget. We further show that shallow transformers may benefit from the generalized Binary Cross-Entropy (gBCE) training scheme, which has recently demonstrated success for recommendation tasks. Our experiments with TREC Deep Learning passage ranking query sets demonstrate significant improvements in shallow and full-sc
Authors
(none)
Tags
Stats
Related papers
- Quality And Cost Trade-offs In Passage Re-ranking Task (2021)0.00
- Towards Efficient Cross-modal Visual Textual Retrieval Using Transformer-encoder Deep Features (2021)6.34
- How Different Are Pre-trained Transformers For Text Ranking? (2022)7.81
- Thinking Fast And Slow: Efficient Text-to-visual Retrieval With Transformers (2021)15.16
- Can Cross Encoders Produce Useful Sentence Embeddings? (2025)0.00
- Evaluating Multilingual Text Encoders For Unsupervised Cross-lingual Retrieval (2021)7.50
- Predicting Efficiency/effectiveness Trade-offs For Dense Vs. Sparse Retrieval Strategy Selection (2021)11.29
- Transfer Learning Approaches For Building Cross-language Dense Retrieval Models (2022)10.97