Towards Better Instruction Following Retrieval Models
2025 Β· Yuchen Zhuang, Aaron Trinh, Rushi Qiang, et al.
Abstract
Modern information retrieval (IR) models, trained exclusively on standard <query, passage> pairs, struggle to effectively interpret and follow explicit user instructions. We introduce InF-IR, a large-scale, high-quality training corpus tailored for enhancing retrieval models in Instruction-Following IR. InF-IR expands traditional training pairs into over 38,000 expressive <instruction, query, passage> triplets as positive samples. In particular, for each positive triplet, we generate two additional hard negative examples by poisoning both instructions and queries, then rigorously validated by an advanced reasoning model (o3-mini) to ensure semantic plausibility while maintaining instructional incorrectness. Unlike existing corpora that primarily support computationally intensive reranking tasks for decoder-only language models, the highly contrastive positive-negative triplets in InF-IR further enable efficient representation learning for smaller encoder-only models, facilitating direc
Authors
(none)
Tags
Stats
Related papers
- Dual-view Training For Instruction-following Information Retrieval (2026)0.00
- Mfollowir: A Multilingual Benchmark For Instruction Following In Retrieval (2025)0.00
- MAIR: A Massive Benchmark For Evaluating Instructed Retrieval (2024)6.41
- Can Instructed Retrieval Models Really Support Exploration? (2026)0.00
- Customir: Unsupervised Fine-tuning Of Dense Embeddings For Known Document Corpora (2025)0.00
- Unihgkr: Unified Instruction-aware Heterogeneous Knowledge Retrievers (2024)0.00
- I^3 Retriever: Incorporating Implicit Interaction In Pre-trained Language Models For Passage Retrieval (2023)7.16
- Uniir: Training And Benchmarking Universal Multimodal Information Retrievers (2023)10.48