SUPERB-SG: Enhanced Speech Processing Universal Performance Benchmark For Semantic And Generative Capabilities
2022 Β· Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, et al.
Abstract
Transfer learning has proven to be crucial in advancing the state of speech and natural language processing research in recent years. In speech, a model pre-trained by self-supervised learning transfers remarkably well on multiple tasks. However, the lack of a consistent evaluation methodology is limiting towards a holistic understanding of the efficacy of such models. SUPERB was a step towards introducing a common benchmark to evaluate pre-trained models across various speech tasks. In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB. We use a lightweight methodology to test the robustness of representations learned by pre-trained models under shifts in data domain and quality across different types of tasks. It entails freezing pre-trained model parameters, only using simple task-specific trainable heads. The goal is to be inclusive of all re
Authors
(none)
Tags
Stats
Related papers
- SUPERB @ SLT 2022: Challenge On Generalization And Efficiency Of Self-supervised Speech Representation Learning (2022)9.23
- ML-SUPERB: Multilingual Speech Universal Performance Benchmark (2023)12.47
- Dynamic-superb: Towards A Dynamic, Collaborative, And Comprehensive Instruction-tuning Benchmark For Speech (2023)0.00
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, And Datasets (2024)4.52
- Dynamic-superb Phase-2: A Collaboratively Expanding Benchmark For Measuring The Capabilities Of Spoken Language Models With 180 Tasks (2024)4.61
- Findings Of The 2023 ML-SUPERB Challenge: Pre-training And Evaluation Over More Languages And Beyond (2023)0.00
- Characterizing The Adversarial Vulnerability Of Speech Self-supervised Learning (2021)4.52
- On The Transferability Of Whisper-based Representations For "in-the-wild" Cross-task Downstream Speech Applications (2023)0.00