Time-domain Multi-modal Bone/air Conducted Speech Enhancement
2019 Β· Cheng Yu, Kuo-Hsuan Hung, Syu-Siang Wang, et al.
Abstract
Previous studies have proven that integrating video signals, as a complementary modality, can facilitate improved performance for speech enhancement (SE). However, video clips usually contain large amounts of data and pose a high cost in terms of computational resources and thus may complicate the SE system. As an alternative source, a bone-conducted speech signal has a moderate data size while manifesting speech-phoneme structures, and thus complements its air-conducted counterpart. In this study, we propose a novel multi-modal SE structure in the time domain that leverages bone- and air-conducted signals. In addition, we examine two ensemble-learning-based strategies, early fusion (EF) and late fusion (LF), to integrate the two types of speech signals, and adopt a deep learning-based fully convolutional network to conduct the enhancement. The experiment results on the Mandarin corpus indicate that this newly presented multi-modal (integrating bone- and air-conducted signals) SE struc
Authors
(none)
Tags
Stats
Related papers
- Cross-domain Single-channel Speech Enhancement Model With Bi-projection Fusion Module For Noise-robust ASR (2021)8.09
- A Study Of Incorporating Articulatory Movement Information In Speech Enhancement (2020)0.00
- Audio-visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks (2017)17.39
- Time-domain Speech Enhancement Assisted By Multi-resolution Frequency Encoder And Decoder (2023)9.76
- Toward Universal Speech Enhancement For Diverse Input Conditions (2023)0.00
- Forknet: Simultaneous Time And Time-frequency Domain Modeling For Speech Enhancement (2023)0.00
- Human Listening And Live Captioning: Multi-task Training For Speech Enhancement (2021)9.92
- FB-MSTCN: A Full-band Single-channel Speech Enhancement Method Based On Multi-scale Temporal Convolutional Network (2022)6.77