Deep Learning Models In Speech Recognition: Measuring GPU Energy Consumption, Impact Of Noise And Model Quantization For Edge Deployment
2024 Β· Aditya Chakravarty
Abstract
Recent transformer-based ASR models have achieved word-error rates (WER) below 4%, surpassing human annotator accuracy, yet they demand extensive server resources, contributing to significant carbon footprints. The traditional server-based architecture of ASR also presents privacy concerns, alongside reliability and latency issues due to network dependencies. In contrast, on-device (edge) ASR enhances privacy, boosts performance, and promotes sustainability by effectively balancing energy use and accuracy for specific applications. This study examines the effects of quantization, memory demands, and energy consumption on the performance of various ASR model inference on the NVIDIA Jetson Orin Nano. By analyzing WER and transcription speed across models using FP32, FP16, and INT8 quantization on clean and noisy datasets, we highlight the crucial trade-offs between accuracy, speeds, quantization, energy efficiency, and memory needs. We found that changing precision from fp32 to fp16 halv
Authors
(none)
Tags
Stats
Related papers
- Optimizing Speech Recognition For The Edge (2019)0.00
- Speech Enhancement Deep-learning Architecture For Efficient Edge Processing (2024)0.00
- Neural Transducer Training: Reduced Memory Consumption With Sample-wise Computation (2022)0.00
- Transfer Learning For Speech Recognition On A Budget (2017)14.27
- Dyn-asr: Compact, Multilingual Speech Recognition Via Spoken Language And Accent Identification (2021)5.24
- A Model For Every User And Budget: Label-free And Personalized Mixed-precision Quantization (2023)0.00
- Efficientasr: Speech Recognition Network Compression Via Attention Redundancy And Chunk-level FFN Optimization (2024)3.58
- Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies Of Large End-to-end Models (2024)5.84