Audio Concept Classification With Hierarchical Deep Neural Networks
2017 Β· Mirco Ravanelli, Benjamin Elizalde, Karl Ni, et al.
Abstract
Audio-based multimedia retrieval tasks may identify semantic information in audio streams, i.e., audio concepts (such as music, laughter, or a revving engine). Conventional Gaussian-Mixture-Models have had some success in classifying a reduced set of audio concepts. However, multi-class classification can benefit from context window analysis and the discriminating power of deeper architectures. Although deep learning has shown promise in various applications such as speech and object recognition, it has not yet met the expectations for other fields such as audio concept classification. This paper explores, for the first time, the potential of deep learning in classifying audio concepts on User-Generated Content videos. The proposed system is comprised of two cascaded neural networks in a hierarchical configuration to analyze the short- and long-term context information. Our system outperforms a GMM approach by a relative 54%, a Neural Network by 33%, and a Deep Neural Network by 12% on
Authors
(none)
Tags
Stats
Related papers
- A Deep Neural Network For Audio Classification With A Classifier Attention Mechanism (2020)0.00
- Audio-based Music Classification With Densenet And Data Augmentation (2019)10.48
- PERSA+: A Deep Learning Front-end For Context-agnostic Audio Classification (2021)0.00
- Audio Classification Of Low Feature Spectrograms Utilizing Convolutional Neural Networks (2024)5.84
- Audio Scene Classification With Deep Recurrent Neural Networks (2017)11.29
- Halluaudio: Hallucinating Frequency As Concepts For Few-shot Audio Classification (2023)3.58
- Reducing Model Complexity For DNN Based Large-scale Audio Classification (2017)9.59
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23