Convolutional Neural Networks And X-vector Embedding For DCASE2018 Acoustic Scene Classification Challenge
2018 Β· Hossein Zeinali, Lukas Burget, Jan Cernocky
Abstract
In this paper, the Brno University of Technology (BUT) team submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2018 challenge are described. Also, the analysis of different methods on the leaderboard set is provided. The proposed approach is a fusion of two different Convolutional Neural Network (CNN) topologies. The first one is the common two-dimensional CNNs which is mainly used in image classification. The second one is a one-dimensional CNN for extracting fixed-length audio segment embeddings, so called x-vectors, which has also been used in speech processing, especially for speaker recognition. In addition to the different topologies, two types of features were tested: log mel-spectrogram and CQT features. Finally, the outputs of different systems are fused using a simple output averaging in the best performing system. Our submissions ranked third among 24 teams in the ASC sub-task A (task1a).
Authors
(none)
Tags
Stats
Related papers
- Audio-visual Scene Classification: Analysis Of DCASE 2021 Challenge Submissions (2021)0.00
- Acoustic Scene Classification Using Multi-layer Temporal Pooling Based On Convolutional Neural Network (2019)0.00
- Acoustic Scene Classification Using Bilinear Pooling On Time-liked And Frequency-liked Convolution Neural Network (2020)5.84
- A Study On Joint Modeling And Data Augmentation Of Multi-modalities For Audio-visual Scene Classification (2022)5.24
- Classifying Variable-length Audio Files With All-convolutional Networks And Masked Global Pooling (2016)0.00
- The Xx205 System For The Voxceleb Speaker Recognition Challenge 2020 (2020)0.00
- BUT System Description To Voxceleb Speaker Recognition Challenge 2019 (2019)0.00
- Robust Acoustic Scene Classification In The Presence Of Active Foreground Speech (2021)4.52