Towards Achieving Robust Universal Neural Vocoding
2018 Β· Jaime Lorenzo-Trueba, Thomas Drugman, Javier Latorre, et al.
Abstract
This paper explores the potential universality of neural vocoders. We train a WaveRNN-based vocoder on 74 speakers coming from 17 languages. This vocoder is shown to be capable of generating speech of consistently good quality (98% relative mean MUSHRA when compared to natural speech) regardless of whether the input spectrogram comes from a speaker or style seen during training or from an out-of-domain scenario when the recording conditions are studio-quality. When the recordings show significant changes in quality, or when moving towards non-speech vocalizations or singing, the vocoder still significantly outperforms speaker-dependent vocoders, but operates at a lower average relative MUSHRA of 75%. These results are shown to be consistent across languages, regardless of them being seen during training (e.g. English or Japanese) or unseen (e.g. Wolof, Swahili, Ahmaric).
Authors
(none)
Tags
Stats
Related papers
- Towards Robust Neural Vocoding For Speech Generation: A Survey (2019)0.00
- Speaker Conditional Wavernn: Towards Universal Neural Vocoder For Unseen Speaker And Recording Conditions (2020)8.60
- Bigvgan: A Universal Neural Vocoder With Large-scale Training (2022)6.17
- Universal Melgan: A Robust Neural Vocoder For High-fidelity Waveform Generation In Multiple Domains (2020)0.00
- Speaker-adaptive Neural Vocoders For Parametric Speech Synthesis Systems (2018)2.26
- Rawnet: Fast End-to-end Neural Vocoder (2019)0.00
- A Comparison Of Recent Waveform Generation And Acoustic Modeling Methods For Neural-network-based Speech Synthesis (2018)11.76
- Training Universal Vocoders With Feature Smoothing-based Augmentation Methods For High-quality TTS Systems (2024)0.00