Adversarial Audio Synthesis
2018 Β· Chris Donahue, Julian McAuley, Miller Puckette
Abstract
Audio signals are sampled at high temporal resolutions, and learning to synthesize audio requires capturing structure across a range of timescales. Generative adversarial networks (GANs) have seen wide success at generating images that are both locally and globally coherent, but they have seen little application to audio generation. In this paper we introduce WaveGAN, a first attempt at applying GANs to unsupervised synthesis of raw-waveform audio. WaveGAN is capable of synthesizing one second slices of audio waveforms with global coherence, suitable for sound effect generation. Our experiments demonstrate that, without labels, WaveGAN learns to produce intelligible words when trained on a small-vocabulary speech dataset, and can also synthesize audio from other domains such as drums, bird vocalizations, and piano. We compare WaveGAN to a method which applies GANs designed for image generation on image-like audio feature representations, finding both approaches to be promising.
Authors
(none)
Tags
Stats
Related papers
- Gansynth: Adversarial Neural Audio Synthesis (2019)0.00
- Voice Command Generation Using Progressive Wavegans (2019)0.00
- Melgan: Generative Adversarial Networks For Conditional Waveform Synthesis (2019)0.00
- Conditional Wavegan (2018)4.22
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- High Fidelity Speech Synthesis With Adversarial Networks (2019)0.00
- Hifi-gan: Generative Adversarial Networks For Efficient And High Fidelity Speech Synthesis (2020)0.00
- Video-driven Speech Reconstruction Using Generative Adversarial Networks (2019)11.39