Operational Advice For Dense And Sparse Retrievers: HNSW, Flat, Or Inverted Indexes?
2024 Β· Jimmy Lin
Abstract
Practitioners working on dense retrieval today face a bewildering number of choices. Beyond selecting the embedding model, another consequential choice is the actual implementation of nearest-neighbor vector search. While best practices recommend HNSW indexes, flat vector indexes with brute-force search represent another viable option, particularly for smaller corpora and for rapid prototyping. In this paper, we provide experimental results on the BEIR dataset using the open-source Lucene search library that explicate the tradeoffs between HNSW and flat indexes (including quantized variants) from the perspectives of indexing time, query evaluation performance, and retrieval quality. With additional comparisons between dense and sparse retrievers, our results provide guidance for today's search practitioner in understanding the design space of dense and sparse retrievers. To our knowledge, we are the first to provide operational advice supported by empirical experiments in this regard.
Authors
(none)
Tags
Stats
Related papers
- End-to-end Retrieval With Learned Dense And Sparse Representations Using Lucene (2023)0.00
- Hybrid Inverted Index Is A Robust Accelerator For Dense Retrieval (2022)7.07
- Lucene For Approximate Nearest-neighbors Search On Arbitrary Dense Vectors (2019)0.00
- Scaling Laws For Embedding Dimension In Information Retrieval (2026)0.00
- Efficient Inverted Indexes For Approximate Retrieval Over Learned Sparse Representations (2024)11.67
- Efficient And Effective Retrieval Of Dense-sparse Hybrid Vectors Using Graph-based Approximate Nearest Neighbor Search (2024)0.00
- From HNSW To Information-theoretic Binarization: Rethinking The Architecture Of Scalable Vector Search (2025)0.00
- Towards Competitive Search Relevance For Inference-free Learned Sparse Retrievers (2024)0.00