Revisit Micro-batch Clipping: Adaptive Data Pruning Via Gradient Manipulation
2024 Β· Lun Wang
Abstract
Micro-batch clipping, a gradient clipping method, has recently shown potential in enhancing auto-speech recognition (ASR) model performance. However, the underlying mechanism behind this improvement remains mysterious, particularly the observation that only certain micro-batch sizes are beneficial. In this paper, we make the first attempt to explain this phenomenon. Inspired by recent data pruning research, we assume that specific training samples may impede model convergence during certain training phases. Under this assumption, the convergence analysis shows that micro-batch clipping can improve the convergence rate asymptotically at the cost of an additional constant bias that does not diminish with more training iterations. The bias is dependent on a few factors and can be minimized at specific micro-batch size, thereby elucidating the existence of the sweet-spot micro-batch size observed previously. We also verify the effectiveness of micro-batch clipping beyond speech models on v
Authors
(none)
Tags
Stats
Related papers
- Efficiently Train ASR Models That Memorize Less And Perform Better With Per-core Clipping (2024)0.00
- Autoclip: Adaptive Gradient Clipping For Source Separation Networks (2020)11.58
- Enabling Differentially Private Federated Learning For Speech Recognition: Benchmarks, Adaptive Optimizers And Gradient Clipping (2023)2.56
- Accurate And Structured Pruning For Efficient Automatic Speech Recognition (2023)7.81
- On Batching Variable Size Inputs For Training End-to-end Speech Enhancement Systems (2023)5.84
- To Reverse The Gradient Or Not: An Empirical Comparison Of Adversarial And Multi-task Learning In Speech Recognition (2018)9.59
- Personalized Lightweight Text-to-speech: Voice Cloning With Adaptive Structured Pruning (2023)6.34
- SA: Sliding Attack For Synthetic Speech Detection With Resistance To Clipping And Self-splicing (2022)0.00