Image Hashing Via Cross-view Code Alignment In The Age Of Foundation Models
2025 · Ilyass Moummad, Kawtar Zaher, Hervé Goëau, et al.
Abstract
Efficient large-scale retrieval requires representations that are both compact and discriminative. Foundation models provide powerful visual and multimodal embeddings, but nearest neighbor search in these high-dimensional spaces is computationally expensive. Hashing offers an efficient alternative by enabling fast Hamming distance search with binary codes, yet existing approaches often rely on complex pipelines, multi-term objectives, designs specialized for a single learning paradigm, and long training times. We introduce CroVCA (Cross-View Code Alignment), a simple and unified principle for learning binary codes that remain consistent across semantically aligned views. A single binary cross-entropy loss enforces alignment, while coding-rate maximization serves as an anti-collapse regularizer to promote balanced and diverse codes. To implement this, we design HashCoder, a lightweight MLP hashing network with a final batch normalization layer to enforce balanced codes. HashCoder can be
Authors
(none)
Tags
Stats
Related papers
- Discriminative Cross-view Binary Representation Learning (2018)4.52
- Learning Discriminative Hashing Codes For Cross-modal Retrieval Based On Multi-view Features (2018)3.58
- Correlation Hashing Network For Efficient Cross-modal Retrieval (2016)11.67
- Simultaneous Feature Aggregating And Hashing For Compact Binary Code Learning (2019)9.92
- Compact Hash Code Learning With Binary Deep Neural Network (2017)9.03
- Hashing In The Zero Shot Framework With Domain Adaptation (2017)10.21
- Unsupervised Deep Hashing For Large-scale Visual Search (2016)9.59
- Weakly-paired Cross-modal Hashing (2019)0.00