WSDM Cup 2026 Multilingual Retrieval: A Low-cost Multi-stage Retrieval Pipeline
2026 Β· Chentong Hao, Minmao Wang
Abstract
We present a low-cost retrieval system for the WSDM Cup 2026 multilingual retrieval task, where English queries are used to retrieve relevant documents from a collection of approximately ten million news articles in Chinese, Persian, and Russian, and to output the top-1000 ranked results for each query. We follow a four-stage pipeline that combines LLM-based GRF-style query expansion with BM25 candidate retrieval, dense ranking using long-text representations from jina-embeddings-v4, and pointwise re-ranking of the top-20 candidates using Qwen3-Reranker-4B while preserving the dense order for the remaining results. On the official evaluation, the system achieves nDCG@20 of 0.403 and Judged@20 of 0.95. We further conduct extensive ablation experiments to quantify the contribution of each stage and to analyze the effectiveness of query expansion, dense ranking, and top-\(k\) reranking under limited compute budgets.
Authors
(none)
Tags
Stats
Related papers
- DS@GT At TREC TOT 2025: Bridging Vague Recollection With Fusion Retrieval And Learned Reranking (2026)0.00
- Deep Retrieval At Checkthat! 2025: Identifying Scientific Papers From Implicit Social Media Mentions Via Hybrid Retrieval And Re-ranking (2025)0.00
- What Drives Cross-lingual Ranking? Retrieval Approaches With Multilingual Language Models (2025)0.00
- A Distributed Collaborative Retrieval Framework Excelling In All Queries And Corpora Based On Zero-shot Rank-oriented Automatic Evaluation (2024)0.00
- Scaling Multilingual Semantic Search In Uber Eats Delivery (2026)0.00
- Boosting Zero-shot Cross-lingual Retrieval By Training On Artificially Code-switched Data (2023)4.52
- Boosting Data Utilization For Multilingual Dense Retrieval (2025)0.00
- MST-R: Multi-stage Tuning For Retrieval Systems And Metric Evaluation (2024)0.00