Effective Audio Classification Network Based On Paired Inverse Pyramid Structure And Dense MLP Block
2022 Β· Yunhao Chen, Yunjie Zhu, Zihui Yan, et al.
Abstract
Recently, massive architectures based on Convolutional Neural Network (CNN) and self-attention mechanisms have become necessary for audio classification. While these techniques are state-of-the-art, these works' effectiveness can only be guaranteed with huge computational costs and parameters, large amounts of data augmentation, transfer from large datasets and some other tricks. By utilizing the lightweight nature of audio, we propose an efficient network structure called Paired Inverse Pyramid Structure (PIP) and a network called Paired Inverse Pyramid Structure MLP Network (PIPMN). The PIPMN reaches 96% of Environmental Sound Classification (ESC) accuracy on the UrbanSound8K dataset and 93.2% of Music Genre Classification (MGC) on the GTAZN dataset, with only 1 million parameters. Both of the results are achieved without data augmentation or model transfer. Public code is available at: https://github.com/JNAIC/PIPMN
Authors
(none)
Tags
Stats
Code
- JNAIC/PIPMNβ
Related papers
- A Deep Neural Network For Audio Classification With A Classifier Attention Mechanism (2020)0.00
- Audio-based Music Classification With Densenet And Data Augmentation (2019)10.48
- Aclnet: Efficient End-to-end Audio Classification CNN (2018)0.00
- Reducing Model Complexity For DNN Based Large-scale Audio Classification (2017)9.59
- Audio Concept Classification With Hierarchical Deep Neural Networks (2017)0.00
- Mmdenselstm: An Efficient Combination Of Convolutional And Recurrent Neural Networks For Audio Source Separation (2018)15.28
- PERSA+: A Deep Learning Front-end For Context-agnostic Audio Classification (2021)0.00
- Convolutional Gated Recurrent Neural Network Incorporating Spatial Features For Audio Tagging (2017)13.23