Multiple Instance Deep Learning For Weakly Supervised Small-footprint Audio Event Detection
2017 Β· Shao-Yen Tseng, Juncheng Li, Yun Wang, et al.
Abstract
State-of-the-art audio event detection (AED) systems rely on supervised learning using strongly labeled data. However, this dependence severely limits scalability to large-scale datasets where fine resolution annotations are too expensive to obtain. In this paper, we propose a small-footprint multiple instance learning (MIL) framework for multi-class AED using weakly annotated labels. The proposed MIL framework uses audio embeddings extracted from a pre-trained convolutional neural network as input features. We show that by using audio embeddings the MIL framework can be implemented using a simple DNN with performance comparable to recurrent neural networks. We evaluate our approach by training an audio tagging system using a subset of AudioSet, which is a large collection of weakly labeled YouTube video excerpts. Combined with a late-fusion approach, we improve the F1 score of a baseline audio tagging system by 17%. We show that audio embeddings extracted by the convolutional neural
Authors
(none)
Tags
Stats
Related papers
- A Light-weight Multimodal Framework For Improved Environmental Audio Tagging (2017)5.24
- Fully Dnn-based Multi-label Regression For Audio Tagging (2016)0.00
- Automatic Audio Captioning Using Attention Weighted Event Based Embeddings (2022)0.00
- Attention And Localization Based On A Deep Convolutional Recurrent Model For Weakly Supervised Audio Tagging (2017)11.39
- An Empirical Study Of Weakly Supervised Audio Tagging Embeddings For General Audio Representations (2022)0.00
- Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor (2024)6.21
- Cross-modal Embeddings For Video And Audio Retrieval (2018)11.08
- Reducing Model Complexity For DNN Based Large-scale Audio Classification (2017)9.59