Utilizing Domain Knowledge In End-to-end Audio Processing
2017 Β· Tycho Max Sylvester Tax, Jose Luis Diez Antich, Hendrik Purwins, et al.
Abstract
End-to-end neural network based approaches to audio modelling are generally outperformed by models trained on high-level data representations. In this paper we present preliminary work that shows the feasibility of training the first layers of a deep convolutional neural network (CNN) model to learn the commonly-used log-scaled mel-spectrogram transformation. Secondly, we demonstrate that upon initializing the first layers of an end-to-end CNN classifier with the learned transformation, convergence and performance on the ESC-50 environmental sound classification dataset are similar to a CNN-based model trained on the highly pre-processed log-scaled mel-spectrogram features.
Authors
(none)
Tags
Stats
Related papers
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23
- Combining High-level Features Of Raw Audio Waves And Mel-spectrograms For Audio Tagging (2018)0.00
- Spectral And Rhythm Features For Audio Classification With Deep Convolutional Neural Networks (2024)0.00
- Sample-level CNN Architectures For Music Auto-tagging Using Raw Waveforms (2017)13.23
- Audio Classification Of Low Feature Spectrograms Utilizing Convolutional Neural Networks (2024)5.84
- Dynamic Convolutional Neural Networks As Efficient Pre-trained Audio Models (2023)0.00
- Conditional End-to-end Audio Transforms (2018)8.82
- PERSA+: A Deep Learning Front-end For Context-agnostic Audio Classification (2021)0.00