Compressed Concatenation Of Small Embedding Models
2025 Β· Mohamed Ayoub Ben Ayad, Michael Dinzinger, Kanishka Ghosh Dastidar, et al.
Abstract
Embedding models are central to dense retrieval, semantic search, and recommendation systems, but their size often makes them impractical to deploy in resource-constrained environments such as browsers or edge devices. While smaller embedding models offer practical advantages, they typically underperform compared to their larger counterparts. To bridge this gap, we demonstrate that concatenating the raw embedding vectors of multiple small models can outperform a single larger baseline on standard retrieval benchmarks. To overcome the resulting high dimensionality of naive concatenation, we introduce a lightweight unified decoder trained with a Matryoshka Representation Learning (MRL) loss. This decoder maps the high-dimensional joint representation to a low-dimensional space, preserving most of the original performance without fine-tuning the base models. We also show that while concatenating more base models yields diminishing gains, the robustness of the decoder's representation unde
Authors
(none)
Tags
Stats
Related papers
- SMEC: Rethinking Matryoshka Representation Learning For Retrieval Embedding Compression (2025)0.00
- Beyond Matryoshka: Revisiting Sparse Coding For Adaptive Representation (2025)4.30
- Mixed-precision Embeddings For Large-scale Recommendation Models (2024)0.00
- Rethinking Hybrid Retrieval: When Small Embeddings And LLM Re-ranking Beat Bigger Models (2025)0.00
- Matryoshka-adaptor: Unsupervised And Supervised Tuning For Smaller Embedding Dimensions (2024)2.26
- Scaling Laws For Embedding Dimension In Information Retrieval (2026)0.00
- CREM: Compression-driven Representation Enhancement For Multimodal Retrieval And Comprehension (2026)0.00
- Metaembed: Scaling Multimodal Retrieval At Test-time With Flexible Late Interaction (2025)2.35