Attention-guided Generative Adversarial Network For Whisper To Normal Speech Conversion
2021 Β· Teng Gao, Jian Zhou, Huabin Wang, et al.
Abstract
Whispered speech is a special way of pronunciation without using vocal cord vibration. A whispered speech does not contain a fundamental frequency, and its energy is about 20dB lower than that of a normal speech. Converting a whispered speech into a normal speech can improve speech quality and intelligibility. In this paper, a novel attention-guided generative adversarial network model incorporating an autoencoder, a Siamese neural network, and an identity mapping loss function for whisper to normal speech conversion (AGAN-W2SC) is proposed. The proposed method avoids the challenge of estimating the fundamental frequency of the normal voiced speech converted from a whispered speech. Specifically, the proposed model is more amendable to practical applications because it does not need to align speech features for training. Experimental results demonstrate that the proposed AGAN-W2SC can obtain improved speech quality and intelligibility compared with dynamic-time-warping-based methods.
Authors
(none)
Tags
Stats
Related papers
- Whispered-to-voiced Alaryngeal Speech Conversion With Generative Adversarial Networks (2018)9.41
- Generative Models For Improved Naturalness, Intelligibility, And Voicing Of Whispered Speech (2022)6.34
- Whispervc: Decoupled Cross-domain Alignment And Speech Generation For Low-resource Whisper-to-normal Conversion (2025)0.00
- End-to-end Whisper To Natural Speech Conversion Using Modified Transformer Network (2020)0.00
- Vocoder-free Non-parallel Conversion Of Whispered Speech With Masked Cycle-consistent Generative Adversarial Networks (2023)0.00
- Cinc-gan For Effective F0 Prediction For Whisper-to-normal Speech Conversion (2020)5.84
- SEGAN: Speech Enhancement Generative Adversarial Network (2017)21.85
- Subband-based Generative Adversarial Network For Non-parallel Many-to-many Voice Conversion (2022)0.00