Unsupervised Acoustic Unit Discovery For Speech Synthesis Using Discrete Latent-variable Neural Networks
2019 Β· Ryan Eloff, AndrΓ© Nortje, Benjamin van Niekerk, et al.
Abstract
For our submission to the ZeroSpeech 2019 challenge, we apply discrete latent-variable neural networks to unlabelled speech and use the discovered units for speech synthesis. Unsupervised discrete subword modelling could be useful for studies of phonetic category learning in infants or in low-resource speech technology requiring symbolic input. We use an autoencoder (AE) architecture with intermediate discretisation. We decouple acoustic unit discovery from speaker modelling by conditioning the AE's decoder on the training speaker identity. At test time, unit discovery is performed on speech from an unseen speaker, followed by unit decoding conditioned on a known target speaker to obtain reconstructed filterbanks. This output is fed to a neural vocoder to synthesise speech in the target speaker's voice. For discretisation, categorical variational autoencoders (CatVAEs), vector-quantised VAEs (VQ-VAEs) and straight-through estimation are compared at different compression levels on two l
Authors
(none)
Tags
Stats
Related papers
- Transformer VQ-VAE For Unsupervised Unit Discovery And Speech Synthesis: Zerospeech 2020 Challenge (2020)9.41
- Vector-quantized Neural Networks For Acoustic Unit Discovery In The Zerospeech 2020 Challenge (2020)13.50
- VQVAE Unsupervised Unit Discovery And Multi-scale Code2spec Inverter For Zerospeech Challenge 2019 (2019)0.00
- Combining Adversarial Training And Disentangled Speech Representation For Robust Zero-resource Subword Modeling (2019)7.16
- Unsupervised Acoustic Unit Representation Learning For Voice Conversion Using Wavenet Auto-encoders (2020)7.16
- Learning Hierarchical Discrete Linguistic Units From Visually-grounded Speech (2019)0.00
- The Zero Resource Speech Challenge 2020: Discovering Discrete Subword And Word Units (2020)11.58
- Unsupervised End-to-end Learning Of Discrete Linguistic Units For Voice Conversion (2019)9.03