Beyond Matryoshka: Revisiting Sparse Coding For Adaptive Representation
2025 Β· Tiansheng Wen, Yifei Wang, Zequn Zeng, et al.
Abstract
Many large-scale systems rely on high-quality deep representations (embeddings) to facilitate tasks like retrieval, search, and generative modeling. Matryoshka Representation Learning (MRL) recently emerged as a solution for adaptive embedding lengths, but it requires full model retraining and suffers from noticeable performance degradations at short lengths. In this paper, we show that sparse coding offers a compelling alternative for achieving adaptive representation with minimal overhead and higher fidelity. We propose Contrastive Sparse Representation (CSR), a method that sparsifies pre-trained embeddings into a high-dimensional but selectively activated feature space. By leveraging lightweight autoencoding and task-aware contrastive objectives, CSR preserves semantic quality while allowing flexible, cost-effective inference at different sparsity levels. Extensive experiments on image, text, and multimodal benchmarks demonstrate that CSR consistently outperforms MRL in terms of bot
Authors
(none)
Tags
Stats
Related papers
- Matryoshka Representation Learning (2022)12.37
- SMEC: Rethinking Matryoshka Representation Learning For Retrieval Embedding Compression (2025)0.00
- Compressed Concatenation Of Small Embedding Models (2025)0.00
- Matryoshka-adaptor: Unsupervised And Supervised Tuning For Smaller Embedding Dimensions (2024)2.26
- Efficient Temporal-aware Matryoshka Adaptation For Temporal Information Retrieval (2026)0.00
- Efficient Learning Of Sparse Representations From Interactions (2026)1.57
- 2D Matryoshka Training For Information Retrieval (2024)4.06
- Sparse And Dense Retrievers Learn Better Together: Joint Sparse-dense Optimization For Text-image Retrieval (2025)0.00