Tevatron: An Efficient And Flexible Toolkit For Dense Retrieval
2022 Β· Luyu Gao, Xueguang Ma, Jimmy Lin, et al.
Abstract
Recent rapid advancements in deep pre-trained language models and the introductions of large datasets have powered research in embedding-based dense retrieval. While several good research papers have emerged, many of them come with their own software stacks. These stacks are typically optimized for some particular research goals instead of efficiency or code structure. In this paper, we present Tevatron, a dense retrieval toolkit optimized for efficiency, flexibility, and code simplicity. Tevatron provides a standardized pipeline for dense retrieval including text processing, model training, corpus/query encoding, and search. This paper presents an overview of Tevatron and demonstrates its effectiveness and efficiency across several IR and QA data sets. We also show how Tevatron's flexible design enables easy generalization across datasets, model architectures, and accelerator platforms(GPU/TPU). We believe Tevatron can serve as an effective software foundation for dense retrieval syst
Authors
(none)
Tags
Stats
Related papers
- Tevatron 2.0: Unified Document Retrieval Toolkit Across Scale, Language, And Modality (2025)3.58
- Efficiently Teaching An Effective Dense Retriever With Balanced Topic Aware Sampling (2021)17.07
- Dense Text Retrieval Based On Pretrained Language Models: A Survey (2022)15.95
- Lightretriever: A Llm-based Text Retrieval Architecture With Extremely Faster Query Inference (2025)0.00
- Predicting Efficiency/effectiveness Trade-offs For Dense Vs. Sparse Retrieval Strategy Selection (2021)11.29
- Pylate: Flexible Training And Retrieval For Late Interaction Models (2025)3.58
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- Thinking Fast And Slow: Efficient Text-to-visual Retrieval With Transformers (2021)15.16