Speech Enhancement Deep-learning Architecture For Efficient Edge Processing
2024 Β· Monisankha Pal, Arvind Ramanathan, Ted Wada, et al.
Abstract
Deep learning has become a de facto method of choice for speech enhancement tasks with significant improvements in speech quality. However, real-time processing with reduced size and computations for low-power edge devices drastically degrades speech quality. Recently, transformer-based architectures have greatly reduced the memory requirements and provided ways to improve the model performance through local and global contexts. However, the transformer operations remain computationally heavy. In this work, we introduce WaveUNet squeeze-excitation Res2 (WSR)-based metric generative adversarial network (WSR-MGAN) architecture that can be efficiently implemented on low-power edge devices for noise suppression tasks while maintaining speech quality. We utilize multi-scale features using Res2Net blocks that can be related to spectral content used in speech-processing tasks. In the generator, we integrate squeeze-excitation blocks (SEB) with multi-scale features for maintaining local and gl
Authors
(none)
Tags
Stats
Related papers
- Study Of Lightweight Transformer Architectures For Single-channel Speech Enhancement (2025)3.58
- Unetgan: A Robust Speech Enhancement Approach In Time Domain For Extremely Low Signal-to-noise Ratio Condition (2020)11.49
- Dense-tsnet: Dense Connected Two-stage Structure For Ultra-lightweight Speech Enhancement (2024)0.00
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- DCCRGAN: Deep Complex Convolution Recurrent Generator Adversarial Network For Speech Enhancement (2020)0.00
- Wave-u-net Discriminator: Fast And Lightweight Discriminator For Generative Adversarial Network-based Speech Synthesis (2023)6.34
- Optimizing Speech Recognition For The Edge (2019)0.00
- Parallel Gated Neural Network With Attention Mechanism For Speech Enhancement (2022)0.00