On The Transferability Of Large-scale Self-supervision To Few-shot Audio Classification
2024 Β· Calum Heggan, Sam Budgett, Timothy Hospedales, et al.
Abstract
In recent years, self-supervised learning has excelled for its capacity to learn robust feature representations from unlabelled data. Networks pretrained through self-supervision serve as effective feature extractors for downstream tasks, including Few-Shot Learning. While the evaluation of unsupervised approaches for few-shot learning is well-established in imagery, it is notably absent in acoustics. This study addresses this gap by assessing large-scale self-supervised models' performance in few-shot audio classification. Additionally, we explore the relationship between a model's few-shot learning capability and other downstream task benchmarks. Our findings reveal state-of-the-art performance in some few-shot problems such as SpeechCommandsv2, as well as strong correlations between speech-based few-shot problems and various downstream audio tasks.
Authors
(none)
Tags
Stats
Related papers
- On The Transferability Of Whisper-based Representations For "in-the-wild" Cross-task Downstream Speech Applications (2023)0.00
- Learning Problem-agnostic Speech Representations From Multiple Self-supervised Tasks (2019)15.54
- Supervised Acoustic Embeddings And Their Transferability Across Languages (2023)0.00
- Conformer-based Self-supervised Learning For Non-speech Audio Tasks (2021)7.50
- Efficient Personalized Speech Enhancement Through Self-supervised Learning (2021)10.21
- Semi Supervised Learning For Few-shot Audio Classification By Episodic Triplet Mining (2021)0.00
- Audio Mamba: Selective State Spaces For Self-supervised Audio Representations (2024)9.23
- The Efficacy Of Self-supervised Speech Models For Audio Representations (2022)0.00