Utilizing Embeddings For Ad-hoc Retrieval By Document-to-document Similarity
2017 Β· Chenhao Yang, Ben He, Yanhua Ran
Abstract
Latent semantic representations of words or paragraphs, namely the embeddings, have been widely applied to information retrieval (IR). One of the common approaches of utilizing embeddings for IR is to estimate the document-to-query (D2Q) similarity in their embeddings. As words with similar syntactic usage are usually very close to each other in the embeddings space, although they are not semantically similar, the D2Q similarity approach may suffer from the problem of "multiple degrees of similarity". To this end, this paper proposes a novel approach that estimates a semantic relevance score (SEM) based on document-to-document (D2D) similarity of embeddings. As Word or Para2Vec generates embeddings by the context of words/paragraphs, the D2D similarity approach turns the task of document ranking into the estimation of similarity between content within different documents. Experimental results on standard TREC test collections show that our proposed approach outperforms strong baselines
Authors
(none)
Tags
Stats
Related papers
- Evaluating The Impact Of Word Embeddings On Similarity Scoring In Practical Information Retrieval (2026)0.00
- Representing Documents And Queries As Sets Of Word Embedded Vectors For Information Retrieval (2016)0.00
- Vectorsearch: Enhancing Document Retrieval With Semantic Embeddings And Optimized Search (2024)0.00
- Description-based Text Similarity (2023)0.00
- Text Embeddings For Retrieval From A Large Knowledge Base (2018)4.52
- A Multi-resolution Word Embedding For Document Retrieval From Large Unstructured Knowledge Bases (2019)0.00
- Improving Document Representations By Generating Pseudo Query Embeddings For Dense Retrieval (2021)9.41
- On The Representational Limits Of Quantum-inspired 1024-D Document Embeddings: An Experimental Evaluation Framework (2026)0.00