Wavecyclegan2: Time-domain Neural Post-filter For Speech Waveform Generation
2019 Β· Kou Tanaka, Hirokazu Kameoka, Takuhiro Kaneko, et al.
Abstract
WaveCycleGAN has recently been proposed to bridge the gap between natural and synthesized speech waveforms in statistical parametric speech synthesis and provides fast inference with a moving average model rather than an autoregressive model and high-quality speech synthesis with the adversarial training. However, the human ear can still distinguish the processed speech waveforms from natural ones. One possible cause of this distinguishability is the aliasing observed in the processed speech waveform via down/up-sampling modules. To solve the aliasing and provide higher quality speech synthesis, we propose WaveCycleGAN2, which 1) uses generators without down/up-sampling modules and 2) combines discriminators of the waveform domain and acoustic parameter domain. The results show that the proposed method 1) alleviates the aliasing well, 2) is useful for both speech waveforms generated by analysis-and-synthesis and statistical parametric speech synthesis, and 3) achieves a mean opinion sc
Authors
(none)
Tags
Stats
Related papers
- Wavecyclegan: Synthetic-to-natural Speech Waveform Conversion Using Cycle-consistent Adversarial Networks (2018)9.92
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00
- Parallel Wavegan: A Fast Waveform Generation Model Based On Generative Adversarial Networks With Multi-resolution Spectrogram (2019)0.00
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- Adversarial Audio Synthesis (2018)0.00
- Wave-u-net Discriminator: Fast And Lightweight Discriminator For Generative Adversarial Network-based Speech Synthesis (2023)6.34
- Wavehax: Aliasing-free Neural Waveform Synthesis Based On 2D Convolution And Harmonic Prior For Reliable Complex Spectrogram Estimation (2024)0.00
- Parallel Waveform Synthesis Based On Generative Adversarial Networks With Voicing-aware Conditional Discriminators (2020)7.16