Voice Command Generation Using Progressive Wavegans
2019 Β· Thomas Wiest, Nicholas Cummins, Alice Baird, et al.
Abstract
Generative Adversarial Networks (GANs) have become exceedingly popular in a wide range of data-driven research fields, due in part to their success in image generation. Their ability to generate new samples, often from only a small amount of input data, makes them an exciting research tool in areas with limited data resources. One less-explored application of GANs is the synthesis of speech and audio samples. Herein, we propose a set of extensions to the WaveGAN paradigm, a recently proposed approach for sound generation using GANs. The aim of these extensions - preprocessing, Audio-to-Audio generation, skip connections and progressive structures - is to improve the human likeness of synthetic speech samples. Scores from listening tests with 30 volunteers demonstrated a moderate improvement (Cohen's d coefficient of 0.65) in human likeness using the proposed extensions compared to the original WaveGAN approach.
Authors
(none)
Tags
Stats
Related papers
- Adversarial Audio Synthesis (2018)0.00
- Waveform Generation For Text-to-speech Synthesis Using Pitch-synchronous Multi-scale Generative Adversarial Networks (2018)8.35
- Conditional Wavegan (2018)4.22
- End-to-end Video-to-speech Synthesis Using Generative Adversarial Networks (2021)11.58
- Analysis By Adversarial Synthesis -- A Novel Approach For Speech Vocoding (2019)3.58
- Generative Adversarial Network-based Glottal Waveform Model For Statistical Parametric Speech Synthesis (2019)10.35
- Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks (2017)16.21
- Generative Adversarial Network Based Speaker Adaptation For High Fidelity Wavenet Vocoder (2018)5.84