VMCML: Video And Music Matching Via Cross-modality Lifting
2023 Β· Yi-Shan Lee, Wei-Cheng Tseng, Fu-En Wang, et al.
Abstract
We propose a content-based system for matching video and background music. The system aims to address the challenges in music recommendation for new users or new music give short-form videos. To this end, we propose a cross-modal framework VMCML that finds a shared embedding space between video and music representations. To ensure the embedding space can be effectively shared by both representations, we leverage CosFace loss based on margin-based cosine similarity loss. Furthermore, we establish a large-scale dataset called MSVD, in which we provide 390 individual music and the corresponding matched 150,000 videos. We conduct extensive experiments on Youtube-8M and our MSVD datasets. Our quantitative and qualitative results demonstrate the effectiveness of our proposed framework and achieve state-of-the-art video and music matching performance.
Authors
(none)
Tags
Stats
Related papers
- Deep Music Retrieval For Fine-grained Videos By Exploiting Cross-modal-encoded Voice-overs (2021)6.34
- Audio-visual Embedding For Cross-modal Musicvideo Retrieval Through Supervised Deep CCA (2019)11.93
- MVBIND: Self-supervised Music Recommendation For Videos Via Embedding Space Binding (2024)0.00
- Content-based Video-music Retrieval Using Soft Intra-modal Structure Constraint (2017)3.60
- Emotion Embedding Spaces For Matching Music To Stories (2021)0.00
- Perfect Match: Improved Cross-modal Embeddings For Audio-visual Synchronisation (2018)14.19
- Contrastive Learning For Cross-modal Artist Retrieval (2023)0.00
- A Multimodal Deep Learning Framework For Scalable Content Based Visual Media Retrieval (2021)0.00