Investigating Mixture Of Experts In Dense Retrieval
2024 Β· Effrosyni Sokli, Pranav Kasela, Georgios Peikos, et al.
Abstract
While Dense Retrieval Models (DRMs) have advanced Information Retrieval (IR), one limitation of these neural models is their narrow generalizability and robustness. To cope with this issue, one can leverage the Mixture-of-Experts (MoE) architecture. While previous IR studies have incorporated MoE architectures within the Transformer layers of DRMs, our work investigates an architecture that integrates a single MoE block (SB-MoE) after the output of the final Transformer layer. Our empirical evaluation investigates how SB-MoE compares, in terms of retrieval effectiveness, to standard fine-tuning. In detail, we fine-tune three DRMs (TinyBERT, BERT, and Contriever) across four benchmark collections with and without adding the MoE block. Moreover, since MoE showcases performance variations with respect to its parameters (i.e., the number of experts), we conduct additional experiments to investigate this aspect further. The findings show the effectiveness of SB-MoE especially for DRMs with
Authors
(none)
Tags
Stats
Related papers
- Mixture Of Experts Approaches In Dense Retrieval Tasks (2025)0.95
- CAME: Competitively Learning A Mixture-of-experts Model For First-stage Retrieval (2023)6.34
- Contrastive Learning And Mixture Of Experts Enables Precise Vector Embeddings (2024)0.00
- Routerretriever: Routing Over A Mixture Of Expert Embedding Models (2024)0.00
- Interpreting Dense Retrieval As Mixture Of Topics (2021)0.00
- Mixture Of Experts With Soft Nearest Neighbor Loss: Resolving Expert Collapse Via Representation Disentanglement (2026)0.00
- Beyond Instruction-conditioning, Mote: Mixture Of Task Experts For Multi-task Embedding Models (2025)0.00
- Investigating Multi-layer Representations For Dense Passage Retrieval (2025)0.00