Spectral And Rhythm Features For Audio Classification With Deep Convolutional Neural Networks
2024 Β· Friedrich Wolf-Monheim
Abstract
Convolutional neural networks (CNNs) are widely used in computer vision. They can be used not only for conventional digital image material to recognize patterns, but also for feature extraction from digital imagery representing spectral and rhythm features extracted from time-domain digital audio signals for the acoustic classification of sounds. Different spectral and rhythm feature representations like mel-scaled spectrograms, mel-frequency cepstral coefficients (MFCCs), cyclic tempograms, short-time Fourier transform (STFT) chromagrams, constant-Q transform (CQT) chromagrams and chroma energy normalized statistics (CENS) chromagrams are investigated in terms of the audio classification performance using a deep convolutional neural network. It can be clearly shown that the mel-scaled spectrograms and the mel-frequency cepstral coefficients (MFCCs) perform significantly better than the other spectral and rhythm features investigated in this research for audio classification tasks usin
Authors
(none)
Tags
Stats
Related papers
- Explaining Deep Convolutional Neural Networks On Music Classification (2016)0.00
- Audio Classification Of Low Feature Spectrograms Utilizing Convolutional Neural Networks (2024)5.84
- A Deep Neural Network For Audio Classification With A Classifier Attention Mechanism (2020)0.00
- Music Genre Classification: A Comparative Analysis Of CNN And Xgboost Approaches With Mel-frequency Cepstral Coefficients And Mel Spectrograms (2024)0.00
- Acoustic Scene Classification Using Convolutional Neural Network And Multiple-width Frequency-delta Data Augmentation (2016)0.00
- Automatic Tagging Using Deep Convolutional Neural Networks (2016)0.00
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23
- Audio-based Music Classification With Densenet And Data Augmentation (2019)10.48