Learning Joint Embedding For Cross-modal Retrieval
2019 Β· Donghuo Zeng
Abstract
A cross-modal retrieval process is to use a query in one modality to obtain relevant data in another modality. The challenging issue of cross-modal retrieval lies in bridging the heterogeneous gap for similarity computation, which has been broadly discussed in image-text, audio-text, and video-text cross-modal multimedia data mining and retrieval. However, the gap in temporal structures of different data modalities is not well addressed due to the lack of alignment relationship between temporal cross-modal structures. Our research focuses on learning the correlation between different modalities for the task of cross-modal retrieval. We have proposed an architecture: Supervised-Deep Canonical Correlation Analysis (S-DCCA), for cross-modal retrieval. In this forum paper, we will talk about how to exploit triplet neural networks (TNN) to enhance the correlation learning for cross-modal retrieval. The experimental result shows the proposed TNN-based supervised correlation learning architec
Authors
(none)
Tags
Stats
Related papers
- Deep Triplet Neural Networks With Cluster-cca For Audio-visual Cross-modal Retrieval (2019)12.61
- Audio-visual Embedding For Cross-modal Musicvideo Retrieval Through Supervised Deep CCA (2019)11.93
- Learning Shared Semantic Space With Correlation Alignment For Cross-modal Event Retrieval (2019)10.21
- End-to-end Cross-modality Retrieval With CCA Projections And Pairwise Ranking Loss (2017)14.68
- Deep Cross-modal Correlation Learning For Audio And Lyrics In Music Retrieval (2017)14.06
- Continual Learning In Cross-modal Retrieval (2021)9.41
- Webly Supervised Joint Embedding For Cross-modal Image-text Retrieval (2018)13.17
- Adversarial Cross-modal Retrieval Via Learning And Transferring Single-modal Similarities (2019)8.60