Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders
2023 Β· Lester Phillip Violeta, Wen-Chin Huang, Ding Ma, et al.
Abstract
We propose a novel framework for electrolaryngeal speech intelligibility enhancement through the use of robust linguistic encoders. Pretraining and fine-tuning approaches have proven to work well in this task, but in most cases, various mismatches, such as the speech type mismatch (electrolaryngeal vs. typical) or a speaker mismatch between the datasets used in each stage, can deteriorate the conversion performance of this framework. To resolve this issue, we propose a linguistic encoder robust enough to project both EL and typical speech in the same latent space, while still being able to extract accurate linguistic information, creating a unified representation to reduce the speech type mismatch. Furthermore, we introduce HuBERT output features to the proposed framework for reducing the speaker mismatch, making it possible to effectively use a large-scale parallel dataset during pretraining. We show that compared to the conventional framework using mel-spectrogram input and output fe
Authors
(none)
Tags
Stats
Related papers
- Intermediate Fine-tuning Using Imperfect Synthetic Speech For Improving Electrolaryngeal Speech Recognition (2022)0.00
- PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction (2021)6.77
- Optimizing Alignment Of Speech And Language Latent Spaces For End-to-end Speech Recognition And Understanding (2021)9.03
- Parameter Enhancement For MELP Speech Codec In Noisy Communication Environment (2019)2.26
- A Comprehensive Solution To Connect Speech Encoder And Large Language Model For ASR (2024)0.00
- ELF: Encoding Speaker-specific Latent Speech Feature For Speech Synthesis (2023)0.00
- La-voce: Low-snr Audio-visual Speech Enhancement Using Neural Vocoders (2022)0.00
- Audio-based Linguistic Feature Extraction For Enhancing Multi-lingual And Low-resource Text-to-speech (2024)0.00