On The Reproducibility Of Learned Sparse Retrieval Adaptations For Long Documents
2025 Β· Emmanouil Georgios Lionis, Jia-Huei Ju
Abstract
Document retrieval is one of the most challenging tasks in Information Retrieval. It requires handling longer contexts, often resulting in higher query latency and increased computational overhead. Recently, Learned Sparse Retrieval (LSR) has emerged as a promising approach to address these challenges. Some have proposed adapting the LSR approach to longer documents by aggregating segmented document using different post-hoc methods, including n-grams and proximity scores, adjusting representations, and learning to ensemble all signals. In this study, we aim to reproduce and examine the mechanisms of adapting LSR for long documents. Our reproducibility experiments confirmed the importance of specific segments, with the first segment consistently dominating document retrieval performance. Furthermore, We re-evaluate recently proposed methods -- ExactSDM and SoftSDM -- across varying document lengths, from short (up to 2 segments) to longer (3+ segments). We also designed multiple analyse
Authors
(none)
Tags
Stats
Related papers
- Adapting Learned Sparse Retrieval For Long Documents (2023)5.24
- L^2R: Lifelong Learning For First-stage Retrieval With Backward-compatible Representations (2023)5.24
- Mistral-splade: Llms For Better Learned Sparse Retrieval (2024)0.00
- SV-RAG: Lora-contextualizing Adaptation Of Mllms For Long Document Understanding (2024)0.00
- Approximate Cluster-based Sparse Document Retrieval With Segmented Maximum Term Weights (2024)0.00
- CSPLADE: Learned Sparse Retrieval With Causal Language Models (2025)0.00
- Scaling Sparse And Dense Retrieval In Decoder-only Llms (2025)6.34
- On The Challenges And Opportunities Of Learned Sparse Retrieval For Code (2026)0.00