Hybrid Contrastive Quantization For Efficient Cross-view Video Retrieval
2022 Β· Jinpeng Wang, Bin Chen, Dongliang Liao, et al.
Abstract
With the recent boom of video-based social platforms (e.g., YouTube and TikTok), video retrieval using sentence queries has become an important demand and attracts increasing research attention. Despite the decent performance, existing text-video retrieval models in vision and language communities are impractical for large-scale Web search because they adopt brute-force search based on high-dimensional embeddings. To improve efficiency, Web search engines widely apply vector compression libraries (e.g., FAISS) to post-process the learned embeddings. Unfortunately, separate compression from feature encoding degrades the robustness of representations and incurs performance decay. To pursue a better balance between performance and efficiency, we propose the first quantized representation learning method for cross-view video retrieval, namely Hybrid Contrastive Quantization (HCQ). Specifically, HCQ learns both coarse-grained and fine-grained quantizations with transformers, which provide c
Authors
(none)
Tags
Stats
Related papers
- Efficient Cross-modal Video Retrieval With Meta-optimized Frames (2022)7.16
- Dual Encoding For Video Retrieval By Text (2020)16.05
- Towards Fast Adaptation Of Pretrained Contrastive Models For Multi-channel Video-language Retrieval (2022)7.50
- Tree-augmented Cross-modal Encoding For Complex-query Video Retrieval (2020)15.57
- Query-centric Audio-visual Cognition Network For Moment Retrieval, Segmentation And Step-captioning (2024)3.58
- Central Similarity Quantization For Efficient Image And Video Retrieval (2019)23.49
- Differentiable Resolution Compression And Alignment For Efficient Video Classification And Retrieval (2023)5.27
- Exploiting Local Indexing And Deep Feature Confidence Scores For Fast Image-to-video Search (2018)2.26