Ultron: An Ultimate Retriever On Corpus With A Model-based Indexer
2022 Β· Yujia Zhou, Jing Yao, Zhicheng Dou, et al.
Abstract
Document retrieval has been extensively studied within the index-retrieve framework for decades, which has withstood the test of time. Unfortunately, such a pipelined framework limits the optimization of the final retrieval quality, because indexing and retrieving are separated stages that can not be jointly optimized in an end-to-end manner. In order to unify these two stages, we explore a model-based indexer for document retrieval. Concretely, we propose Ultron, which encodes the knowledge of all documents into the model and aims to directly retrieve relevant documents end-to-end. For the model-based indexer, how to represent docids and how to train the model are two main issues to be explored. Existing solutions suffer from semantically deficient docids and limited supervised data. To tackle these two problems, first, we devise two types of docids that are richer in semantics and easier for model inference. In addition, we propose a three-stage training workflow to capture more know
Authors
(none)
Tags
Stats
Related papers
- Tevatron 2.0: Unified Document Retrieval Toolkit Across Scale, Language, And Modality (2025)3.58
- Unifier: A Unified Retriever For Large-scale Retrieval (2022)7.50
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- Enrichindex: Using Llms To Enrich Retrieval Indices Offline (2025)0.00
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- A Distributed Collaborative Retrieval Framework Excelling In All Queries And Corpora Based On Zero-shot Rank-oriented Automatic Evaluation (2024)0.00
- Optimizing Retrieval Components For A Shared Backbone Via Component-wise Multi-stage Training (2026)0.00
- Pylate: Flexible Training And Retrieval For Late Interaction Models (2025)3.58