Conformer-based Self-supervised Learning For Non-speech Audio Tasks
2021 Β· Sangeeta Srivastava, Yun Wang, Andros Tjandra, et al.
Abstract
Representation learning from unlabeled data has been of major interest in artificial intelligence research. While self-supervised speech representation learning has been popular in the speech research community, very few works have comprehensively analyzed audio representation learning for non-speech audio tasks. In this paper, we propose a self-supervised audio representation learning method and apply it to a variety of downstream non-speech audio tasks. We combine the well-known wav2vec 2.0 framework, which has shown success in self-supervised learning for speech tasks, with parameter-efficient conformer architectures. Our self-supervised pre-training can reduce the need for labeled data by two-thirds. On the AudioSet benchmark, we achieve a mean average precision (mAP) score of 0.415, which is a new state-of-the-art on this dataset through audio-only self-supervised learning. Our fine-tuned conformers also surpass or match the performance of previous systems pre-trained in a supervi
Authors
(none)
Tags
Stats
Related papers
- Universal Paralinguistic Speech Representations Using Self-supervised Conformers (2021)10.48
- Wav2vec 2.0: A Framework For Self-supervised Learning Of Speech Representations (2020)0.00
- Learning Speech Representations From Raw Audio By Joint Audiovisual Self-supervision (2020)0.00
- Automatic Data Augmentation Selection And Parametrization In Contrastive Self-supervised Speech Representation Learning (2022)5.24
- Learning Self-supervised Audio-visual Representations For Sound Recommendations (2024)2.26
- Learning Problem-agnostic Speech Representations From Multiple Self-supervised Tasks (2019)15.54
- The Efficacy Of Self-supervised Speech Models For Audio Representations (2022)0.00
- Ccc-wav2vec 2.0: Clustering Aided Cross Contrastive Self-supervised Learning Of Speech Representations (2022)7.81