Hyrec: Exploring Hybrid-based Retriever For Chinese
2025 Β· Zunran Wang, Zheng Shenpeng, Wang Shenglan, et al.
Abstract
Hybrid-based retrieval methods, which unify dense-vector and lexicon-based retrieval, have garnered considerable attention in the industry due to performance enhancement. However, despite their promising results, the application of these hybrid paradigms in Chinese retrieval contexts has remained largely underexplored. In this paper, we introduce HyReC, an innovative end-to-end optimization method tailored specifically for hybrid-based retrieval in Chinese. HyReC enhances performance by integrating the semantic union of terms into the representation model. Additionally, it features the Global-Local-Aware Encoder (GLAE) to promote consistent semantic sharing between lexicon-based and dense retrieval while minimizing the interference between them. To further refine alignment, we incorporate a Normalization Module (NM) that fosters mutual benefits between the retrieval approaches. Finally, we evaluate HyReC on the C-MTEB retrieval benchmark to demonstrate its effectiveness.
Authors
(none)
Tags
Stats
Related papers
- DS@GT At TREC TOT 2025: Bridging Vague Recollection With Fusion Retrieval And Learned Reranking (2026)0.00
- Boosting Data Utilization For Multilingual Dense Retrieval (2025)0.00
- Unifier: A Unified Retriever For Large-scale Retrieval (2022)7.50
- Rethinking Hybrid Retrieval: When Small Embeddings And LLM Re-ranking Beat Bigger Models (2025)0.00
- HYRR: Hybrid Infused Reranking For Passage Retrieval (2022)0.00
- Hybrid And Collaborative Passage Reranking (2023)2.26
- Freeret: Mllms As Training-free Retrievers (2025)0.00
- Mor: Better Handling Diverse Queries With A Mixture Of Sparse, Dense, And Human Retrievers (2025)2.26