Exploring Heterogeneous Characteristics Of Layers In ASR Models For More Efficient Training
2021 Β· Lillian Zhou, Dhruv Guliani, Andreas Kabel, et al.
Abstract
Transformer-based architectures have been the subject of research aimed at understanding their overparameterization and the non-uniform importance of their layers. Applying these approaches to Automatic Speech Recognition, we demonstrate that the state-of-the-art Conformer models generally have multiple ambient layers. We study the stability of these layers across runs and model sizes, propose that group normalization may be used without disrupting their formation, and examine their correlation with model weight updates in each layer. Finally, we apply these findings to Federated Learning in order to improve the training procedure, by targeting Federated Dropout to layers by importance. This allows us to reduce the model size optimized by clients without quality degradation, and shows potential for future exploration.
Authors
(none)
Tags
Stats
Related papers
- Transformer-based ASR Incorporating Time-reduction Layer And Fine-tuning With Self-knowledge Distillation (2021)6.34
- How Redundant Is The Transformer Stack In Speech Representation Models? (2024)2.26
- Towards A Unified Conformer Structure: From ASR To ASV Task (2022)13.11
- Importance Of Smoothness Induced By Optimizers In FL4ASR: Towards Understanding Federated Learning For End-to-end ASR (2023)0.00
- Domain Adaptation Of Low-resource Target-domain Models Using Well-trained ASR Conformer Models (2022)4.52
- Efficientasr: Speech Recognition Network Compression Via Attention Redundancy And Chunk-level FFN Optimization (2024)3.58
- Accurate And Structured Pruning For Efficient Automatic Speech Recognition (2023)7.81
- Towards Effective And Compact Contextual Representation For Conformer Transducer Speech Recognition Systems (2023)7.16