Avocodo: Generative Adversarial Network For Artifact-free Vocoder
2022 Β· Taejun Bak, Junmo Lee, Hanbin Bae, et al.
Abstract
Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech waveforms. This multi-scale analysis helps the generator improve speech intelligibility. However, in preliminary experiments, we discovered that the multi-scale analysis which focuses on the low-frequency bands causes unintended artifacts, e.g., aliasing and imaging artifacts, which degrade the synthesized speech waveform quality. Therefore, in this paper, we investigate the relationship between these artifacts and GAN-based vocoders and propose a GAN-based vocoder, called Avocodo, that allows the synthesis of high-fidelity speech with reduced artifacts. We introduce two kinds of discriminators to
Authors
(none)
Tags
Stats
Related papers
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- Vnet: A Gan-based Multi-tier Discriminator Network For Speech Synthesis Vocoders (2024)2.26
- Adavocoder: Adaptive Vocoder For Custom Voice (2022)2.26
- A Post Auto-regressive GAN Vocoder Focused On Spectrum Fracture (2022)0.00
- Bigvgan: A Universal Neural Vocoder With Large-scale Training (2022)6.17
- Multi-scale Sub-band Constant-q Transform Discriminator For High-fidelity Vocoder (2023)0.00
- Analysis By Adversarial Synthesis -- A Novel Approach For Speech Vocoding (2019)3.58
- Training Generative Adversarial Network-based Vocoder With Limited Data Using Augmentation-conditional Discriminator (2024)2.26