Exploring Deep Hybrid Tensor-to-vector Network Architectures For Regression Based Speech Enhancement
2020 Β· Jun Qi, Hu Hu, Yannan Wang, et al.
Abstract
This paper investigates different trade-offs between the number of model parameters and enhanced speech qualities by employing several deep tensor-to-vector regression models for speech enhancement. We find that a hybrid architecture, namely CNN-TT, is capable of maintaining a good quality performance with a reduced model parameter size. CNN-TT is composed of several convolutional layers at the bottom for feature extraction to improve speech quality and a tensor-train (TT) output layer on the top to reduce model parameters. We first derive a new upper bound on the generalization power of the convolutional neural network (CNN) based vector-to-vector regression models. Then, we provide experimental evidence on the Edinburgh noisy speech corpus to demonstrate that, in single-channel speech enhancement, CNN outperforms DNN at the expense of a small increment of model sizes. Besides, CNN-TT slightly outperforms the CNN counterpart by utilizing only 32% of the CNN model parameters. Besides,
Authors
(none)
Tags
Stats
Related papers
- Tensor-to-vector Regression For Multi-channel Speech Enhancement Based On Tensor-train Network (2020)12.39
- Exploiting Low-rank Tensor-train Deep Neural Networks Based On Riemannian Gradient Descent With Illustrations Of Speech Processing (2022)0.00
- Multi-modal Hybrid Deep Neural Network For Speech Enhancement (2016)0.00
- Tensor-train Long Short-term Memory For Monaural Speech Enhancement (2018)0.00
- Dense-tsnet: Dense Connected Two-stage Structure For Ultra-lightweight Speech Enhancement (2024)0.00
- TFCN: Temporal-frequential Convolutional Network For Single-channel Speech Enhancement (2022)0.00
- PCNN: A Lightweight Parallel Conformer Neural Network For Efficient Monaural Speech Enhancement (2023)6.77
- TSTNN: Two-stage Transformer Based Neural Network For Speech Enhancement In The Time Domain (2021)16.73