Acoustic Scene Classification Using Multi-layer Temporal Pooling Based On Convolutional Neural Network
2019 Β· Liwen Zhang, Jiqing Han
Abstract
The performance of an Acoustic Scene Classification (ASC) system is highly depending on the latent temporal dynamics of the audio signal. In this paper, we proposed a multiple layers temporal pooling method using CNN feature sequence as in-put, which can effectively capture the temporal dynamics for an entire audio signal with arbitrary duration by building direct connections between the sequence and its time indexes. We applied our novel framework on DCASE 2018 task 1, ASC. For evaluation, we trained a Support Vector Machine (SVM) with the proposed Multi-Layered Temporal Pooling (MLTP) learned features. Experimental results on the development dataset, usage of the MLTP features significantly improved the ASC performance. The best performance with 75.28% accuracy was achieved by using the optimal setting found in our experiments.
Authors
(none)
Tags
Stats
Related papers
- Acoustic Scene Classification Using Bilinear Pooling On Time-liked And Frequency-liked Convolution Neural Network (2020)5.84
- Classifying Variable-length Audio Files With All-convolutional Networks And Masked Global Pooling (2016)0.00
- Convolutional Neural Networks And X-vector Embedding For DCASE2018 Acoustic Scene Classification Challenge (2018)0.00
- Acoustic Scene Classification Using Convolutional Neural Network And Multiple-width Frequency-delta Data Augmentation (2016)0.00
- Audio-visual Scene Classification: Analysis Of DCASE 2021 Challenge Submissions (2021)0.00
- A Comparison Of Pooling Methods On LSTM Models For Rare Acoustic Event Classification (2020)10.21
- A Simple Fusion Of Deep And Shallow Learning For Acoustic Scene Classification (2018)0.00
- Deep Cnns Along The Time Axis With Intermap Pooling For Robustness To Spectral Variations (2016)6.77