Modeling Music Modality With A Key-class Invariant Pitch Chroma CNN
2019 Β· Anders Elowsson, Anders Friberg
Abstract
This paper presents a convolutional neural network (CNN) that uses input from a polyphonic pitch estimation system to predict perceived minor/major modality in music audio. The pitch activation input is structured to allow the first CNN layer to compute two pitch chromas focused on different octaves. The following layers perform harmony analysis across chroma and time scales. Through max pooling across pitch, the CNN becomes invariant with regards to the key class (i.e., key disregarding mode) of the music. A multilayer perceptron combines the modality activation output with spectral features for the final prediction. The study uses a dataset of 203 excerpts rated by around 20 listeners each, a small challenging data size requiring a carefully designed parameter sharing. With an R2 of about 0.71, the system clearly outperforms previous systems as well as individual human listeners. A final ablation study highlights the importance of using pitch activations processed across longer time
Authors
(none)
Tags
Stats
Related papers
- Genre-agnostic Key Classification With Convolutional Neural Networks (2018)0.00
- Hppnet: Modeling The Harmonic Structure And Pitch Invariance In Piano Transcription (2022)0.00
- Deep-learning Architectures For Multi-pitch Estimation: Towards Reliable Evaluation (2022)0.00
- Explaining Deep Convolutional Neural Networks On Music Classification (2016)0.00
- Vocal Melody Extraction Using Patch-based CNN (2018)12.47
- Invariances And Data Augmentation For Supervised Music Transcription (2017)11.08
- Between Homomorphic Signal Processing And Deep Neural Networks: Constructing Deep Algorithms For Polyphonic Music Transcription (2017)0.00
- Sample-level CNN Architectures For Music Auto-tagging Using Raw Waveforms (2017)13.23