End-to-end Whisper To Natural Speech Conversion Using Modified Transformer Network
2020 Β· Abhishek Niranjan, Mukesh Sharma, Sai Bharath Chandra Gutha, et al.
Abstract
Machine recognition of an atypical speech like whispered speech, is a challenging task. We introduce whisper-to-natural-speech conversion using sequence-to-sequence approach by proposing enhanced transformer architecture, which uses both parallel and non-parallel data. We investigate different features like Mel frequency cepstral coefficients and smoothed spectral features. The proposed networks are trained end-to-end using supervised approach for feature-to-feature transformation. Further, we also investigate the effectiveness of embedded auxillary decoder used after N encoder sub-layers, trained with the frame-level objective function for identifying source phoneme labels. We show results on opensource wTIMIT and CHAINS datasets by measuring word error rate using end-to-end ASR and also BLEU scores for the generated speech. Alternatively, we also propose a novel method to measure spectral shape of it by measuring formant distributions w.r.t. reference speech, as formant divergence me
Authors
(none)
Tags
Stats
Related papers
- Attention-guided Generative Adversarial Network For Whisper To Normal Speech Conversion (2021)5.84
- Generative Models For Improved Naturalness, Intelligibility, And Voicing Of Whispered Speech (2022)6.34
- Whispervc: Decoupled Cross-domain Alignment And Speech Generation For Low-resource Whisper-to-normal Conversion (2025)0.00
- On The Transferability Of Whisper-based Representations For "in-the-wild" Cross-task Downstream Speech Applications (2023)0.00
- Whisper Speaker Identification: Leveraging Pre-trained Multilingual Transformers For Robust Speaker Embeddings (2025)0.00
- A Whisper Transformer For Audio Captioning Trained With Synthetic Captions And Transfer Learning (2023)0.00
- Whispered-to-voiced Alaryngeal Speech Conversion With Generative Adversarial Networks (2018)9.41
- Transformer-transducer: End-to-end Speech Recognition With Self-attention (2019)0.00