Study Of Lightweight Transformer Architectures For Single-channel Speech Enhancement
2025 Β· Haixin Zhao, Nilesh Madhu
Abstract
In speech enhancement, achieving state-of-the-art (SotA) performance while adhering to the computational constraints on edge devices remains a formidable challenge. Networks integrating stacked temporal and spectral modelling effectively leverage improved architectures such as transformers; however, they inevitably incur substantial computational complexity and model expansion. Through systematic ablation analysis on transformer-based temporal and spectral modelling, we demonstrate that the architecture employing streamlined Frequency-Time-Frequency (FTF) stacked transformers efficiently learns global dependencies within causal context, while avoiding considerable computational demands. Utilising discriminators in training further improves learning efficacy and enhancement without introducing additional complexity during inference. The proposed lightweight, causal, transformer-based architecture with adversarial training (LCT-GAN) yields SoTA performance on instrumental metrics among c
Authors
(none)
Tags
Stats
Related papers
- Dense-tsnet: Dense Connected Two-stage Structure For Ultra-lightweight Speech Enhancement (2024)0.00
- Speech Enhancement Deep-learning Architecture For Efficient Edge Processing (2024)0.00
- TSTNN: Two-stage Transformer Based Neural Network For Speech Enhancement In The Time Domain (2021)16.73
- Deftan-ii: Efficient Multichannel Speech Enhancement With Subgroup Processing (2023)7.16
- T-GSA: Transformer With Gaussian-weighted Self-attention For Speech Enhancement (2019)15.95
- Boosting Objective Scores Of A Speech Enhancement Model By Metricgan Post-processing (2020)0.00
- Frame-stacked Local Transformers For Efficient Multi-codebook Speech Generation (2025)0.00
- Lightweight And Efficient End-to-end Speech Recognition Using Low-rank Transformer (2019)0.00