Supervised And Unsupervised Learning Of Audio Representations For Music Understanding
2022 Β· Matthew C. McCallum, Filip Korzeniowski, Sergio Oramas, et al.
Abstract
In this work, we provide a broad comparative analysis of strategies for pre-training audio understanding models for several tasks in the music domain, including labelling of genre, era, origin, mood, instrumentation, key, pitch, vocal characteristics, tempo and sonority. Specifically, we explore how the domain of pre-training datasets (music or generic audio) and the pre-training methodology (supervised or unsupervised) affects the adequacy of the resulting audio embeddings for downstream tasks. We show that models trained via supervised learning on large-scale expert-annotated music datasets achieve state-of-the-art performance in a wide range of music labelling tasks, each with novel content and vocabularies. This can be done in an efficient manner with models containing less than 100 million parameters that require no fine-tuning or reparameterization for downstream tasks, making this approach practical for industry-scale audio catalogs. Within the class of unsupervised learning
Authors
(none)
Tags
Stats
Related papers
- Learning Music Audio Representations Via Weak Language Supervision (2021)10.07
- An Empirical Study Of Weakly Supervised Audio Tagging Embeddings For General Audio Representations (2022)0.00
- Supervised Metric Learning For Music Structure Features (2021)0.00
- Audioldm 2: Learning Holistic Audio Generation With Self-supervised Pretraining (2023)0.00
- Self-supervised Learning Of Context-aware Pitch Prosody Representations (2020)0.00
- On The Transferability Of Large-scale Self-supervision To Few-shot Audio Classification (2024)3.58
- Mumu-llama: Multi-modal Music Understanding And Generation Via Large Language Models (2024)6.34
- Machine Learning Framework For Audio-based Content Evaluation Using MFCC, Chroma, Spectral Contrast, And Temporal Feature Engineering (2024)0.00