Succeeding At Scale: Automated Dataset Construction And Query-side Adaptation For Multi-tenant Search
2026 Β· Prateek Jain, Shabari S Nair, Ritesh Goru, et al.
Abstract
Large-scale multi-tenant retrieval systems generate extensive query logs but lack curated relevance labels for effective domain adaptation, resulting in substantial underutilized "dark data". This challenge is compounded by the high cost of model updates, as jointly fine-tuning query and document encoders requires full corpus re-indexing, which is impractical in multi-tenant settings with thousands of isolated indices. We introduce DevRev-Search, a passage retrieval benchmark for technical customer support built via a fully automated pipeline. Candidate generation uses fusion across diverse sparse and dense retrievers, followed by an LLM-as-a-Judge for consistency filtering and relevance labeling. We further propose an Index-Preserving Adaptation strategy that fine-tunes only the query encoder, achieving strong performance gains while keeping document indices fixed. Experiments on DevRev-Search, SciFact, and FiQA-2018 show that Parameter-Efficient Fine-Tuning (PEFT) of the query encode
Authors
(none)
Tags
Stats
Related papers
- Domain-adaptive And Scalable Dense Retrieval For Content-based Recommendation (2026)0.00
- Improving Query Representations For Dense Retrieval With Pseudo Relevance Feedback: A Reproducibility Study (2021)7.16
- A Deep Learning Approach For Selective Relevance Feedback (2024)6.34
- Mine And Refine: Optimizing Graded Relevance In E-commerce Search Retrieval (2026)0.00
- Dreditor: An Time-efficient Approach For Building A Domain-specific Dense Retrieval Model (2024)0.00
- PEFA: Parameter-free Adapters For Large-scale Embedding-based Retrieval Models (2023)7.73
- Bixse: Improving Dense Retrieval Via Probabilistic Graded Relevance Distillation (2025)0.00
- A Reference Architecture For Agentic Hybrid Retrieval In Dataset Search (2026)0.00