Multi-modal Hybrid Deep Neural Network For Speech Enhancement
2016 Β· Zhenzhou Wu, Sunil Sivadas, Yong Kiam Tan, et al.
Abstract
Deep Neural Networks (DNN) have been successful in en- hancing noisy speech signals. Enhancement is achieved by learning a nonlinear mapping function from the features of the corrupted speech signal to that of the reference clean speech signal. The quality of predicted features can be improved by providing additional side channel information that is robust to noise, such as visual cues. In this paper we propose a novel deep learning model inspired by insights from human audio visual perception. In the proposed unified hybrid architecture, features from a Convolution Neural Network (CNN) that processes the visual cues and features from a fully connected DNN that processes the audio signal are integrated using a Bidirectional Long Short-Term Memory (BiLSTM) network. The parameters of the hybrid model are jointly learned using backpropagation. We compare the quality of enhanced speech from the hybrid models with those from traditional DNN and BiLSTM models.
Authors
(none)
Tags
Stats
Related papers
- Audio-visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks (2017)17.39
- Consistency-aware Multi-channel Speech Enhancement Using Deep Neural Networks (2020)0.00
- Deep Neural Network Techniques For Monaural Speech Enhancement: State Of The Art Analysis (2022)0.00
- On The Role Of Spatial, Spectral, And Temporal Processing For Dnn-based Non-linear Multi-channel Speech Enhancement (2022)7.81
- Exploring Deep Hybrid Tensor-to-vector Network Architectures For Regression Based Speech Enhancement (2020)7.50
- Insights Into Deep Non-linear Filters For Improved Multi-channel Speech Enhancement (2022)13.93
- How To Leverage Dnn-based Speech Enhancement For Multi-channel Speaker Verification? (2022)0.00
- Deep Interaction Between Masking And Mapping Targets For Single-channel Speech Enhancement (2021)0.00