Speech Self-supervised Representations Benchmarking: A Case For Larger Probing Heads
2023 Β· Salah Zaiem, Youcef Kemiche, Titouan Parcollet, et al.
Abstract
Self-supervised learning (SSL) leverages large datasets of unlabeled speech to reach impressive performance with reduced amounts of annotated data. The high number of proposed approaches fostered the emergence of comprehensive benchmarks that evaluate their performance on a set of downstream tasks exploring various aspects of the speech signal. However, while the number of considered tasks has been growing, most proposals rely upon a single downstream architecture that maps the frozen SSL representations to the task labels. This study examines how benchmarking results are affected by changes in the probing head architecture. Interestingly, we found that altering the downstream architecture structure leads to significant fluctuations in the performance ranking of the evaluated models. Against common practices in speech SSL benchmarking, we evaluate larger-capacity probing heads, showing their impact on performance, inference costs, generalization and multi-level feature exploitation.
Authors
(none)
Tags
Stats
Related papers
- Lebenchmark: A Reproducible Framework For Assessing Self-supervised Representation Learning From Speech (2021)11.39
- A Large-scale Probing Analysis Of Speaker-specific Attributes In Self-supervised Speech Representations (2025)0.00
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35
- Analyzing The Factors Affecting Usefulness Of Self-supervised Pre-trained Representations For Speech Recognition (2022)0.00
- Lebenchmark 2.0: A Standardized, Replicable And Enhanced Framework For Self-supervised Representations Of French Speech (2023)0.00
- SUPERB @ SLT 2022: Challenge On Generalization And Efficiency Of Self-supervised Speech Representation Learning (2022)9.23
- ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, And Datasets (2024)4.52
- Investigation Of Ensemble Features Of Self-supervised Pretrained Models For Automatic Speech Recognition (2022)9.41