Automatic Construction Of Evaluation Sets And Evaluation Of Document Similarity Models In Large Scholarly Retrieval Systems
2016 Β· Kriste Krstovski, David A. Smith, Michael J. Kurtz
Abstract
Retrieval systems for scholarly literature offer the ability for the scientific community to search, explore and download scholarly articles across various scientific disciplines. Mostly used by the experts in the particular field, these systems contain user community logs including information on user specific downloaded articles. In this paper we present a novel approach for automatically evaluating document similarity models in large collections of scholarly publications. Unlike typical evaluation settings that use test collections consisting of query documents and human annotated relevance judgments, we use download logs to automatically generate pseudo-relevant set of similar document pairs. More specifically we show that consecutively downloaded document pairs, extracted from a scholarly information retrieval (IR) system, could be utilized as a test collection for evaluating document similarity models. Another novel aspect of our approach lies in the method that we employ for eva
Authors
(none)
Tags
Stats
Related papers
- Are We On The Right Way For Assessing Document Retrieval-augmented Generation? (2025)0.00
- Fetch-a-set: A Large-scale Ocr-free Benchmark For Historical Document Retrieval (2024)0.00
- Chain Of Retrieval: Multi-aspect Iterative Search Expansion And Post-order Search Aggregation For Full Paper Retrieval (2025)0.95
- Performance Evaluation In Multimedia Retrieval (2024)8.82
- A Fast Text Similarity Measure For Large Document Collections Using Multi-reference Cosine And Genetic Algorithm (2018)4.52
- Beyond Benchmarks: Evaluating Embedding Model Similarity For Retrieval Augmented Generation Systems (2024)0.00
- Pre-training Tasks For Embedding-based Large-scale Retrieval (2020)0.00
- Cross-media Similarity Evaluation For Web Image Retrieval In The Wild (2017)9.59