A Dense Representation Framework For Lexical And Semantic Matching
2022 Β· Sheng-Chieh Lin, Jimmy Lin
Abstract
Lexical and semantic matching capture different successful approaches to text retrieval and the fusion of their results has proven to be more effective and robust than either alone. Prior work performs hybrid retrieval by conducting lexical and semantic matching using different systems (e.g., Lucene and Faiss, respectively) and then fusing their model outputs. In contrast, our work integrates lexical representations with dense semantic representations by densifying high-dimensional lexical representations into what we call low-dimensional dense lexical representations (DLRs). Our experiments show that DLRs can effectively approximate the original lexical representations, preserving effectiveness while improving query latency. Furthermore, we can combine dense lexical and semantic representations to generate dense hybrid representations (DHRs) that are more flexible and yield faster retrieval compared to existing hybrid techniques. In addition, we explore it jointly training lexical and
Authors
(none)
Tags
Stats
Related papers
- Lexsembridge: Fine-grained Dense Representation Enhancement Through Token-aware Embedding Augmentation (2025)2.35
- Ultra-high Dimensional Sparse Representations With Binarization For Efficient Text Retrieval (2021)8.60
- BERM: Training The Balanced And Extractable Representation For Matching To Improve Generalization Ability Of Dense Retrieval (2023)5.84
- Complementing Lexical Retrieval With Semantic Residual Embedding (2020)13.50
- Learning To Retrieve: How To Train A Dense Retrieval Model Effectively And Efficiently (2020)0.00
- What Are You Token About? Dense Retrieval As Distributions Over The Vocabulary (2022)8.09
- Joint Fusion And Encoding: Advancing Multimodal Retrieval From The Ground Up (2025)0.00
- Unifying Latent And Lexicon Representations For Effective Video-text Retrieval (2024)0.00