Improving Speech Enhancement Performance By Leveraging Contextual Broad Phonetic Class Information
2020 Β· Yen-Ju Lu, Chia-Yu Chang, Cheng Yu, et al.
Abstract
Previous studies have confirmed that by augmenting acoustic features with the place/manner of articulatory features, the speech enhancement (SE) process can be guided to consider the broad phonetic properties of the input speech when performing enhancement to attain performance improvements. In this paper, we explore the contextual information of articulatory attributes as additional information to further benefit SE. More specifically, we propose to improve the SE performance by leveraging losses from an end-to-end automatic speech recognition (E2E-ASR) model that predicts the sequence of broad phonetic classes (BPCs). We also developed multi-objective training with ASR and perceptual losses to train the SE system based on a BPC-based E2E-ASR. Experimental results from speech denoising, speech dereverberation, and impaired speech enhancement tasks confirmed that contextual BPC information improves SE performance. Moreover, the SE model trained with the BPC-based E2E-ASR outperforms th
Authors
(none)
Tags
Stats
Related papers
- Improving Neural Biasing For Contextual Speech Recognition By Early Context Injection And Text Perturbation (2024)8.09
- Dynamic Acoustic Compensation And Adaptive Focal Training For Personalized Speech Enhancement (2022)4.52
- Cross-domain Single-channel Speech Enhancement Model With Bi-projection Fusion Module For Noise-robust ASR (2021)8.09
- Human Listening And Live Captioning: Multi-task Training For Speech Enhancement (2021)9.92
- Deep Context: End-to-end Contextual Speech Recognition (2018)15.57
- Magnitude-phase Dual-path Speech Enhancement Network Based On Self-supervised Embedding And Perceptual Contrast Stretch Boosting (2025)3.21
- Sef-pnet: Speaker Encoder-free Personalized Speech Enhancement With Local And Global Contexts Aggregation (2025)2.26
- Multi-objective Learning And Mask-based Post-processing For Deep Neural Network Based Speech Enhancement (2017)9.76