Multimodal Dataset Normalization And Perceptual Validation For Music-taste Correspondences
2026 Β· Matteo Spanio, Valentina Frezzato, Antonio RodΓ
Abstract
Collecting large, aligned cross-modal datasets for music-flavor research is difficult because perceptual experiments are costly and small by design. We address this bottleneck through two complementary experiments. The first tests whether audio-flavor correlations, feature-importance rankings, and latent-factor structure transfer from an experimental soundtracks collection (257~tracks with human annotations) to a large FMA-derived corpus (\(\sim\)49,300 segments with synthetic labels). The second validates computational flavor targets -- derived from food chemistry via a reproducible pipeline -- against human perception in an online listener study (49~participants, 20~tracks). Results from both experiments converge: the quantitative transfer analysis confirms that cross-modal structure is preserved across supervision regimes, and the perceptual evaluation shows significant alignment between computational targets and listener ratings (permutation \(p<0.0001\), Mantel \(r=0.45\), Procrus
Authors
(none)
Tags
Stats
Related papers
- Unified Cross-modal Translation Of Score Images, Symbolic Music, And Performance Audio (2025)0.00
- Musictm-dataset For Joint Representation Learning Among Sheet Music, Lyrics, And Musical Audio (2020)3.58
- Play As You Like: Timbre-enhanced Multi-modal Music Style Transfer (2018)9.92
- Multi-modality In Music: Predicting Emotion In Music From High-level Audio Features And Lyrics (2023)0.00
- Perceptual Musical Features For Interpretable Audio Tagging (2023)5.24
- Expressivity-aware Music Performance Retrieval Using Mid-level Perceptual Features And Emotion Word Embeddings (2024)0.00
- Stylus: Repurposing Stable Diffusion For Training-free Music Style Transfer On Mel-spectrograms (2024)0.00
- Latent Diffusion Bridges For Unsupervised Musical Audio Timbre Transfer (2024)3.58