Airwaves At Checkthat! 2025: Retrieving Scientific Sources For Implicit Claims On Social Media With Dual Encoders And Neural Re-ranking
2025 · Cem Ashbaugh, Leon Baumgärtner, Tim Gress, et al.
Abstract
Linking implicit scientific claims made on social media to their original publications is crucial for evidence-based fact-checking and scholarly discourse, yet it is hindered by lexical sparsity, very short queries, and domain-specific language. Team AIRwaves ranked second in Subtask 4b of the CLEF-2025 CheckThat! Lab with an evidence-retrieval approach that markedly outperforms the competition baseline. The optimized sparse-retrieval baseline(BM25) achieves MRR@5 = 0.5025 on the gold label blind test set. To surpass this baseline, a two-stage retrieval pipeline is introduced: (i) a first stage that uses a dual encoder based on E5-large, fine-tuned using in-batch and mined hard negatives and enhanced through chunked tokenization and rich document metadata; and (ii) a neural re-ranking stage using a SciBERT cross-encoder. Replacing purely lexical matching with neural representations lifts performance to MRR@5 = 0.6174, and the complete pipeline further improves to MRR@5 = 0.6828. The fi
Authors
(none)
Tags
Stats
Related papers
- Deep Retrieval At Checkthat! 2025: Identifying Scientific Papers From Implicit Social Media Mentions Via Hybrid Retrieval And Re-ranking (2025)0.00
- Beyond Retrieval: Ensembling Cross-encoders And GPT Rerankers With Llms For Biomedical QA (2025)0.00
- Team IELAB At TREC Clinical Trial Track 2023: Enhancing Clinical Trial Retrieval With Neural Rankers And Large Language Models (2024)0.00
- DS@GT At TREC TOT 2025: Bridging Vague Recollection With Fusion Retrieval And Learned Reranking (2026)0.00
- Enhancing The Ranking Context Of Dense Retrieval Methods Through Reciprocal Nearest Neighbors (2023)4.52
- Systematic Evaluation Of Neural Retrieval Models On The Touch\'e 2020 Argument Retrieval Subset Of BEIR (2024)9.31
- WSDM Cup 2026 Multilingual Retrieval: A Low-cost Multi-stage Retrieval Pipeline (2026)0.00
- Noisy Self-training With Synthetic Queries For Dense Retrieval (2023)0.00