Communication-efficient Personalized Federated Learning For Speech-to-text Tasks
2024 Β· Yichao Du, Zhirui Zhang, Linan Yue, et al.
Abstract
To protect privacy and meet legal regulations, federated learning (FL) has gained significant attention for training speech-to-text (S2T) systems, including automatic speech recognition (ASR) and speech translation (ST). However, the commonly used FL approach (i.e., \textsc\{FedAvg\}) in S2T tasks typically suffers from extensive communication overhead due to multi-round interactions based on the whole model and performance degradation caused by data heterogeneity among clients.To address these issues, we propose a personalized federated S2T framework that introduces \textsc\{FedLoRA\}, a lightweight LoRA module for client-side tuning and interaction with the server to minimize communication overhead, and \textsc\{FedMem\}, a global model equipped with a \(k\)-nearest-neighbor (\(k\)NN) classifier that captures client-specific distributional shifts to achieve personalization and overcome data heterogeneity. Extensive experiments based on Conformer and Whisper backbone models on CoVoST
Authors
(none)
Tags
Stats
Related papers
- Fedspeech: Federated Text-to-speech With Continual Learning (2021)9.23
- Private Language Model Adaptation For Speech Recognition (2021)0.00
- Fednst: Federated Noisy Student Training For Automatic Speech Recognition (2022)6.77
- Fed-pisa: Federated Voice Cloning Via Personalized Identity-style Adaptation (2025)0.00
- Semi-fedser: Semi-supervised Learning For Speech Emotion Recognition On Federated Learning Using Multiview Pseudo-labeling (2022)8.82
- The Gift Of Feedback: Improving ASR Model Quality By Learning From User Corrections Through Federated Learning (2023)0.00
- Federated Marginal Personalization For ASR Rescoring (2020)2.26
- Training Speech Recognition Models With Federated Learning: A Quality/cost Framework (2020)12.93