ML-SUPERB: Multilingual Speech Universal Performance Benchmark
2023 Β· Jiatong Shi, Dan Berrebbi, William Chen, et al.
Abstract
Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks. However, SUPERB largely considers English speech in its evaluation. This paper presents multilingual SUPERB (ML-SUPERB), covering 143 languages (ranging from high-resource to endangered), and considering both automatic speech recognition and language identification. Following the concept of SUPERB, ML-SUPERB utilizes frozen SSL features and employs a simple framework for multilingual tasks by learning a shallow downstream model. Similar to the SUPERB benchmark, we find speech SSL models can significantly improve performance compared to FBANK features. Furthermore, we find that multilingual models do not always perform better than their monolingual counterparts. We will release ML-SUPERB as a challenge with organized datasets and reproducible training scripts for future multilingual representation research.
Authors
(none)
Tags
Stats
Related papers
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, And Datasets (2024)4.52
- Findings Of The 2023 ML-SUPERB Challenge: Pre-training And Evaluation Over More Languages And Beyond (2023)0.00
- SUPERB @ SLT 2022: Challenge On Generalization And Efficiency Of Self-supervised Speech Representation Learning (2022)9.23
- SUPERB-SG: Enhanced Speech Processing Universal Performance Benchmark For Semantic And Generative Capabilities (2022)13.34
- Characterizing The Adversarial Vulnerability Of Speech Self-supervised Learning (2021)4.52
- Lebenchmark: A Reproducible Framework For Assessing Self-supervised Representation Learning From Speech (2021)11.39
- Lebenchmark 2.0: A Standardized, Replicable And Enhanced Framework For Self-supervised Representations Of French Speech (2023)0.00
- Dynamic-superb: Towards A Dynamic, Collaborative, And Comprehensive Instruction-tuning Benchmark For Speech (2023)0.00