Evaluating The Impact Of Word Embeddings On Similarity Scoring In Practical Information Retrieval
2026 Β· Niall McCarroll, Kevin Curran, Eugene McNamee, et al.
Abstract
Search behaviour is characterised using synonymy and polysemy as users often want to search information based on meaning. Semantic representation strategies represent a move towards richer associative connections that can adequately capture this complex usage of language. Vector Space Modelling (VSM) and neural word embeddings play a crucial role in modern machine learning and Natural Language Processing (NLP) pipelines. Embeddings use distributional semantics to represent words, sentences, paragraphs or entire documents as vectors in high dimensional spaces. This can be leveraged by Information Retrieval (IR) systems to exploit the semantic relatedness between queries and answers. This paper evaluates an alternative approach to measuring query statement similarity that moves away from the common similarity measure of centroids of neural word embeddings. Motivated by the Word Movers Distance (WMD) model, similarity is evaluated using the distance between individual words of queries a
Authors
(none)
Tags
Stats
Related papers
- Representing Documents And Queries As Sets Of Word Embedded Vectors For Information Retrieval (2016)0.00
- Utilizing Embeddings For Ad-hoc Retrieval By Document-to-document Similarity (2017)0.00
- Semantic Vector Encoding And Similarity Search Using Fulltext Search Engines (2017)6.77
- Neural Vector Spaces For Unsupervised Information Retrieval (2017)12.93
- Vectorsearch: Enhancing Document Retrieval With Semantic Embeddings And Optimized Search (2024)0.00
- A Survey On Efficient Processing Of Similarity Queries Over Neural Embeddings (2022)0.00
- Description-based Text Similarity (2023)0.00
- Rethinking Similarity Search: Embracing Smarter Mechanisms Over Smarter Data (2023)0.00