Non-parallel Voice Conversion System With Wavenet Vocoder And Collapsed Speech Suppression
2020 Β· Yi-Chiao Wu, Patrick Lumban Tobing, Kazuhiro Kobayashi, et al.
Abstract
In this paper, we integrate a simple non-parallel voice conversion (VC) system with a WaveNet (WN) vocoder and a proposed collapsed speech suppression technique. The effectiveness of WN as a vocoder for generating high-fidelity speech waveforms on the basis of acoustic features has been confirmed in recent works. However, when combining the WN vocoder with a VC system, the distorted acoustic features, acoustic and temporal mismatches, and exposure bias usually lead to significant speech quality degradation, making WN generate some very noisy speech segments called collapsed speech. To tackle the problem, we take conventional-vocoder-generated speech as the reference speech to derive a linear predictive coding distribution constraint (LPCDC) to avoid the collapsed speech problem. Furthermore, to mitigate the negative effects introduced by the LPCDC, we propose a collapsed speech segment detector (CSSD) to ensure that the LPCDC is only applied to the problematic segments to limit the los
Authors
(none)
Tags
Stats
Related papers
- Collapsed Speech Segment Detection And Suppression For Wavenet Vocoder (2018)9.03
- A Vocoder-free Wavenet Voice Conversion With Non-parallel Data (2019)0.00
- Statistical Voice Conversion With Quasi-periodic Wavenet Vocoder (2019)3.58
- Quasi-periodic Wavenet Vocoder: A Pitch Dependent Dilated Convolution Model For Parametric Speech Generation (2019)7.50
- VC-ENHANCE: Speech Restoration With Integrated Noise Suppression And Voice Conversion (2024)0.00
- Vocoder-free Non-parallel Conversion Of Whispered Speech With Masked Cycle-consistent Generative Adversarial Networks (2023)0.00
- Refined Wavenet Vocoder For Variational Autoencoder Based Voice Conversion (2018)7.50
- AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion (2021)7.50