Towards Advanced Speech Signal Processing: A Statistical Perspective On Convolution-based Architectures And Its Applications
2024 Β· Nirmal Joshua Kapu, Raghav Karan
Abstract
This article surveys convolution-based models including convolutional neural networks (CNNs), Conformers, ResNets, and CRNNs-as speech signal processing models and provide their statistical backgrounds and speech recognition, speaker identification, emotion recognition, and speech enhancement applications. Through comparative training cost assessment, model size, accuracy and speed assessment, we compare the strengths and weaknesses of each model, identify potential errors and propose avenues for further research, emphasizing the central role it plays in advancing applications of speech technologies.
Authors
(none)
Tags
Stats
Related papers
- SICRN: Advancing Speech Enhancement Through State Space Model And Inplace Convolution Techniques (2024)7.81
- PCNN: A Lightweight Parallel Conformer Neural Network For Efficient Monaural Speech Enhancement (2023)6.77
- What Do Neural Networks Listen To? Exploring The Crucial Bands In Speech Enhancement Using Sinc-convolution (2024)2.26
- Constrained Convolutional-recurrent Networks To Improve Speech Quality With Low Impact On Recognition Accuracy (2018)5.24
- Df-conformer: Integrated Architecture Of Conv-tasnet And Conformer Using Linear Complexity Self-attention For Speech Enhancement (2021)11.29
- Analyzing Large Receptive Field Convolutional Networks For Distant Speech Recognition (2019)5.84
- Contextnet: Improving Convolutional Neural Networks For Automatic Speech Recognition With Global Context (2020)17.24
- Using Deep Learning Techniques And Inferential Speech Statistics For AI Synthesised Speech Recognition (2021)0.00