Joint Representation Learning And Novel Category Discovery On Single- And Multi-modal Data
2021 Β· Xuhui Jia, Kai Han, Yukun Zhu, et al.
Abstract
This paper studies the problem of novel category discovery on single- and multi-modal data with labels from different but relevant categories. We present a generic, end-to-end framework to jointly learn a reliable representation and assign clusters to unlabelled data. To avoid over-fitting the learnt embedding to labelled data, we take inspiration from self-supervised representation learning by noise-contrastive estimation and extend it to jointly handle labelled and unlabelled data. In particular, we propose using category discrimination on labelled data and cross-modal discrimination on multi-modal data to augment instance discrimination used in conventional contrastive learning approaches. We further employ Winner-Take-All (WTA) hashing algorithm on the shared representation space to generate pairwise pseudo labels for unlabelled data to better predict cluster assignments. We thoroughly evaluate our framework on large-scale multi-modal video benchmarks Kinetics-400 and VGG-Sound, an
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Discrete Representation Learning (2021)10.61
- Multimodal Clustering Networks For Self-supervised Learning From Unlabeled Videos (2021)13.28
- Learning Shared Representations From Unpaired Data (2025)0.00
- A Discriminative Vectorial Framework For Multi-modal Feature Representation (2021)8.60
- Category-oriented Representation Learning For Image To Multi-modal Retrieval (2023)0.00
- Learning Robust Visual-semantic Embeddings (2017)15.22
- Object Category Learning And Retrieval With Weak Supervision (2018)0.00
- Dual Pose-invariant Embeddings: Learning Category And Object-specific Discriminative Representations For Recognition And Retrieval (2024)4.52