Unihgkr: Unified Instruction-aware Heterogeneous Knowledge Retrievers
2024 Β· Dehai Min, Zhiyang Xu, Guilin Qi, et al.
Abstract
Existing information retrieval (IR) models often assume a homogeneous structure for knowledge sources and user queries, limiting their applicability in real-world settings where retrieval is inherently heterogeneous and diverse. In this paper, we introduce UniHGKR, a unified instruction-aware heterogeneous knowledge retriever that (1) builds a unified retrieval space for heterogeneous knowledge and (2) follows diverse user instructions to retrieve knowledge of specified types. UniHGKR consists of three principal stages: heterogeneous self-supervised pretraining, text-anchored embedding alignment, and instruction-aware retriever fine-tuning, enabling it to generalize across varied retrieval contexts. This framework is highly scalable, with a BERT-based version and a UniHGKR-7B version trained on large language models. Also, we introduce CompMix-IR, the first native heterogeneous knowledge retrieval benchmark. It includes two retrieval scenarios with various instructions, over 9,400 ques
Authors
(none)
Tags
Stats
Related papers
- Uniir: Training And Benchmarking Universal Multimodal Information Retrievers (2023)10.48
- Unifier: A Unified Retriever For Large-scale Retrieval (2022)7.50
- Heterogeneous Uncertainty-guided Composed Image Retrieval With Fine-grained Probabilistic Learning (2026)0.00
- Mfollowir: A Multilingual Benchmark For Instruction Following In Retrieval (2025)0.00
- Uni-retriever: Towards Learning The Unified Embedding Based Retriever In Bing Sponsored Search (2022)9.92
- Corpusbrain: Pre-train A Generative Retrieval Model For Knowledge-intensive Language Tasks (2022)12.47
- Universal Vision-language Dense Retrieval: Learning A Unified Representation Space For Multi-modal Retrieval (2022)3.45
- MAIR: A Massive Benchmark For Evaluating Instructed Retrieval (2024)6.41