Description-based Text Similarity
2023 Β· Shauli Ravfogel, Valentina Pyatkin, Amir Dn Cohen, et al.
Abstract
Identifying texts with a given semantics is central for many information seeking scenarios. Similarity search over vector embeddings appear to be central to this ability, yet the similarity reflected in current text embeddings is corpus-driven, and is inconsistent and sub-optimal for many use cases. What, then, is a good notion of similarity for effective retrieval of text? We identify the need to search for texts based on abstract descriptions of their content, and the corresponding notion of *description based similarity*. We demonstrate the inadequacy of current text embeddings and propose an alternative model that significantly improves when used in standard nearest neighbor search. The model is trained using positive and negative pairs sourced through prompting a LLM, demonstrating how data from LLMs can be used for creating new capabilities not immediately possible using the original model.
Authors
(none)
Tags
Stats
Related papers
- Evaluating The Impact Of Word Embeddings On Similarity Scoring In Practical Information Retrieval (2026)0.00
- Rethinking Similarity Search: Embracing Smarter Mechanisms Over Smarter Data (2023)0.00
- Semantic Vector Encoding And Similarity Search Using Fulltext Search Engines (2017)6.77
- Utilizing Embeddings For Ad-hoc Retrieval By Document-to-document Similarity (2017)0.00
- Vectorsearch: Enhancing Document Retrieval With Semantic Embeddings And Optimized Search (2024)0.00
- Vector Embedding Of Multi-modal Texts: A Tool For Discovery? (2025)0.00
- Learning To Embed Semantic Similarity For Joint Image-text Retrieval (2022)7.50
- Improving Text-based Person Search Via Part-level Cross-modal Correspondence (2024)0.00