Conditional Cross Attention Network For Multi-space Embedding Without Entanglement In Only A SINGLE Network
2023 Β· Chull Hwan Song, Taebaek Hwang, Jooyoung Yoon, et al.
Abstract
Many studies in vision tasks have aimed to create effective embedding spaces for single-label object prediction within an image. However, in reality, most objects possess multiple specific attributes, such as shape, color, and length, with each attribute composed of various classes. To apply models in real-world scenarios, it is essential to be able to distinguish between the granular components of an object. Conventional approaches to embedding multiple specific attributes into a single network often result in entanglement, where fine-grained features of each attribute cannot be identified separately. To address this problem, we propose a Conditional Cross-Attention Network that induces disentangled multi-space embeddings for various specific attributes with only a single backbone. Firstly, we employ a cross-attention mechanism to fuse and switch the information of conditions (specific attributes), and we demonstrate its effectiveness through a diverse visualization example. Secondly,
Authors
(none)
Tags
Stats
Related papers
- MHSAN: Multi-head Self-attention Network For Visual Semantic Embedding (2020)10.48
- Conditional Similarity Networks (2016)15.06
- Generalized Multi-view Embedding For Visual Recognition And Cross-modal Retrieval (2016)14.69
- Target-oriented Deformation Of Visual-semantic Embedding Space (2019)4.52
- Single-branch Network For Multimodal Training (2023)13.26
- LILE: Look In-depth Before Looking Elsewhere -- A Dual Attention Network Using Transformers For Cross-modal Information Retrieval In Histopathology Archives (2022)0.00
- Improving Cross-modal Retrieval With Set Of Diverse Embeddings (2022)13.55
- Unified Representation Learning For Cross Model Compatibility (2020)5.24