Efficiently Train ASR Models That Memorize Less And Perform Better With Per-core Clipping
2024 Β· Lun Wang, Om Thakkar, Zhong Meng, et al.
Abstract
Gradient clipping plays a vital role in training large-scale automatic speech recognition (ASR) models. It is typically applied to minibatch gradients to prevent gradient explosion, and to the individual sample gradients to mitigate unintended memorization. This work systematically investigates the impact of a specific granularity of gradient clipping, namely per-core clip-ping (PCC), across training a wide range of ASR models. We empirically demonstrate that PCC can effectively mitigate unintended memorization in ASR models. Surprisingly, we find that PCC positively influences ASR performance metrics, leading to improved convergence rates and reduced word error rates. To avoid tuning the additional hyperparameter introduced by PCC, we further propose a novel variant, adaptive per-core clipping (APCC), for streamlined optimization. Our findings highlight the multifaceted benefits of PCC as a strategy for robust, privacy-forward ASR model training.
Authors
(none)
Tags
Stats
Related papers
- Revisit Micro-batch Clipping: Adaptive Data Pruning Via Gradient Manipulation (2024)0.00
- Enabling Differentially Private Federated Learning For Speech Recognition: Benchmarks, Adaptive Optimizers And Gradient Clipping (2023)2.56
- Continual Learning For Monolingual End-to-end Automatic Speech Recognition (2021)7.16
- Rehearsal-free Online Continual Learning For Automatic Speech Recognition (2023)5.24
- Autoclip: Adaptive Gradient Clipping For Source Separation Networks (2020)11.58
- Residual Adapters For Parameter-efficient ASR Adaptation To Atypical And Accented Speech (2021)10.74
- Continual Learning Optimizations For Auto-regressive Decoder Of Multilingual ASR Systems (2024)5.84
- Guided Contrastive Self-supervised Pre-training For Automatic Speech Recognition (2022)0.00