PARM: A Paragraph Aggregation Retrieval Model For Dense Document-to-document Retrieval
2022 · Sophia Althammer, Sebastian Hofstätter, Mete Sertkan, et al.
Abstract
Dense passage retrieval (DPR) models show great effectiveness gains in first stage retrieval for the web domain. However in the web domain we are in a setting with large amounts of training data and a query-to-passage or a query-to-document retrieval task. We investigate in this paper dense document-to-document retrieval with limited labelled target data for training, in particular legal case retrieval. In order to use DPR models for document-to-document retrieval, we propose a Paragraph Aggregation Retrieval Model (PARM) which liberates DPR models from their limited input length. PARM retrieves documents on the paragraph-level: for each query paragraph, relevant documents are retrieved based on their paragraphs. Then the relevant results per query paragraph are aggregated into one ranked list for the whole query document. For the aggregation we propose vector-based aggregation with reciprocal rank fusion (VRRF) weighting, which combines the advantages of rank-based aggregation and top
Authors
(none)
Tags
Stats
Related papers
- DAPR: A Benchmark On Document-aware Passage Retrieval (2023)5.18
- Dense Passage Retrieval: Is It Retrieving? (2024)6.34
- Aggretriever: A Simple Approach To Aggregate Textual Representations For Robust Dense Passage Retrieval (2022)13.22
- Augmenting Document Representations For Dense Retrieval With Interpolation And Perturbation (2022)5.84
- Improving Dense Passage Retrieval With Multiple Positive Passages (2025)0.00
- Improving Query Representations For Dense Retrieval With Pseudo Relevance Feedback: A Reproducibility Study (2021)7.16
- A Passage-based Approach To Learning To Rank Documents (2019)8.60
- MA-DPR: Manifold-aware Distance Metrics For Dense Passage Retrieval (2025)0.00