Importance Of Smoothness Induced By Optimizers In FL4ASR: Towards Understanding Federated Learning For End-to-end ASR
2023 Β· Sheikh Shams Azam, Tatiana Likhomanenko, Martin Pelikan, et al.
Abstract
In this paper, we start by training End-to-End Automatic Speech Recognition (ASR) models using Federated Learning (FL) and examining the fundamental considerations that can be pivotal in minimizing the performance gap in terms of word error rate between models trained using FL versus their centralized counterpart. Specifically, we study the effect of (i) adaptive optimizers, (ii) loss characteristics via altering Connectionist Temporal Classification (CTC) weight, (iii) model initialization through seed start, (iv) carrying over modeling setup from experiences in centralized training to FL, e.g., pre-layer or post-layer normalization, and (v) FL-specific hyperparameters, such as number of local epochs, client sampling size, and learning rate scheduler, specifically for ASR under heterogeneous data distribution. We shed light on how some optimizers work better than others via inducing smoothness. We also summarize the applicability of algorithms, trends, and propose best practices from
Authors
(none)
Tags
Stats
Related papers
- The Gift Of Feedback: Improving ASR Model Quality By Learning From User Corrections Through Federated Learning (2023)0.00
- Enabling Differentially Private Federated Learning For Speech Recognition: Benchmarks, Adaptive Optimizers And Gradient Clipping (2023)2.56
- Communication-efficient Personalized Federated Learning For Speech-to-text Tasks (2024)7.81
- Continual Learning For Monolingual End-to-end Automatic Speech Recognition (2021)7.16
- Towards Fair ASR For Second Language Speakers Using Fairness Prompted Finetuning (2025)0.00
- Reducing Geographic Disparities In Automatic Speech Recognition Via Elastic Weight Consolidation (2022)2.26
- Exploring Heterogeneous Characteristics Of Layers In ASR Models For More Efficient Training (2021)2.26
- Fine-tuning Strategies For Faster Inference Using Speech Self-supervised Models: A Comparative Study (2023)8.35