A Study Of Incorporating Articulatory Movement Information In Speech Enhancement
2020 Β· Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, et al.
Abstract
Although deep learning algorithms are widely used for improving speech enhancement (SE) performance, the performance remains limited under highly challenging conditions, such as unseen noise or noise signals having low signal-to-noise ratios (SNRs). This study provides a pilot investigation on a novel multimodal audio-articulatory-movement SE (AAMSE) model to enhance SE performance under such challenging conditions. Articulatory movement features and acoustic signals were used as inputs to waveform-mapping-based and spectral-mapping-based SE systems with three fusion strategies. In addition, an ablation study was conducted to evaluate SE performance using a limited number of articulatory movement sensors. Experimental results confirm that, by combining the modalities, the AAMSE model notably improves the SE performance in terms of speech quality and intelligibility, as compared to conventional audio-only SE baselines.
Authors
(none)
Tags
Stats
Related papers
- Time-domain Multi-modal Bone/air Conducted Speech Enhancement (2019)12.99
- Joint Training Of Speech Enhancement And Self-supervised Model For Noise-robust ASR (2022)0.00
- Incorporating Symbolic Sequential Modeling For Speech Enhancement (2019)0.00
- Bridging The Gap: Integrating Pre-trained Speech Enhancement And Recognition Models For Robust Speech Recognition (2024)7.50
- Audio-visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks (2017)17.39
- Human Listening And Live Captioning: Multi-task Training For Speech Enhancement (2021)9.92
- An Investigation Of Incorporating Mamba For Speech Enhancement (2024)13.70
- Unpaired Speech Enhancement By Acoustic And Adversarial Supervision For Speech Recognition (2018)10.21