Fully Few-shot Class-incremental Audio Classification Using Expandable Dual-embedding Extractor
2024 Β· Yongjie Si, Yanxiong Li, Jialong Li, et al.
Abstract
It's assumed that training data is sufficient in base session of few-shot class-incremental audio classification. However, it's difficult to collect abundant samples for model training in base session in some practical scenarios due to the data scarcity of some classes. This paper explores a new problem of fully few-shot class-incremental audio classification with few training samples in all sessions. Moreover, we propose a method using expandable dual-embedding extractor to solve it. The proposed model consists of an embedding extractor and an expandable classifier. The embedding extractor consists of a pretrained Audio Spectrogram Transformer (AST) and a finetuned AST. The expandable classifier consists of prototypes and each prototype represents a class. Experiments are conducted on three datasets (LS-100, NSynth-100 and FSC-89). Results show that our method exceeds seven baseline ones in average accuracy with statistical significance. Code is at: https://github.com/YongjieSi/EDE.
Authors
(none)
Tags
Stats
Code
Related papers
- Towards Robust Few-shot Class Incremental Learning In Audio Classification Using Contrastive Representation (2024)4.52
- Semi Supervised Learning For Few-shot Audio Classification By Episodic Triplet Mining (2021)0.00
- Halluaudio: Hallucinating Frequency As Concepts For Few-shot Audio Classification (2023)3.58
- On The Transferability Of Large-scale Self-supervision To Few-shot Audio Classification (2024)3.58
- Few-shot Speaker Identification Using Depthwise Separable Convolutional Network With Channel Attention (2022)5.24
- Few Shot Speaker Recognition Using Deep Neural Networks (2019)0.00
- Multiple Instance Deep Learning For Weakly Supervised Small-footprint Audio Event Detection (2017)7.50
- Dropclass And Dropadapt: Dropping Classes For Deep Speaker Representation Learning (2020)0.00