Unsupervised Acoustic Unit Representation Learning For Voice Conversion Using Wavenet Auto-encoders
2020 Β· Mingjie Chen, Thomas Hain
Abstract
Unsupervised representation learning of speech has been of keen interest in recent years, which is for example evident in the wide interest of the ZeroSpeech challenges. This work presents a new method for learning frame level representations based on WaveNet auto-encoders. Of particular interest in the ZeroSpeech Challenge 2019 were models with discrete latent variable such as the Vector Quantized Variational Auto-Encoder (VQVAE). However these models generate speech with relatively poor quality. In this work we aim to address this with two approaches: first WaveNet is used as the decoder and to generate waveform data directly from the latent representation; second, the low complexity of latent representations is improved with two alternative disentanglement learning methods, namely instance normalization and sliced vector quantization. The method was developed and tested in the context of the recent ZeroSpeech challenge 2020. The system output submitted to the challenge obtained the
Authors
(none)
Tags
Stats
Related papers
- Unsupervised Speech Representation Learning Using Wavenet Autoencoders (2019)17.21
- Robust Disentangled Variational Speech Representation Learning For Zero-shot Voice Conversion (2022)10.97
- Unsupervised Acoustic Unit Discovery For Speech Synthesis Using Discrete Latent-variable Neural Networks (2019)9.59
- VQVC+: One-shot Voice Conversion By Vector Quantization And U-net Architecture (2020)13.34
- The Neteasegames System For Voice Conversion Challenge 2020 With Vector-quantization Variational Autoencoder And Wavenet (2020)0.00
- ACE-VC: Adaptive And Controllable Voice Conversion Using Explicitly Disentangled Self-supervised Speech Representations (2023)0.00
- Training Robust Zero-shot Voice Conversion Models With Self-supervised Features (2021)7.16
- Voice Conversion From Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks (2017)16.34