AC-VC: Non-parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion
2021 Β· Damien Ronssin, Milos Cernak
Abstract
This paper presents AC-VC (Almost Causal Voice Conversion), a phonetic posteriorgrams based voice conversion system that can perform any-to-many voice conversion while having only 57.5 ms future look-ahead. The complete system is composed of three neural networks trained separately with non-parallel data. While most of the current voice conversion systems focus primarily on quality irrespective of algorithmic latency, this work elaborates on designing a method using a minimal amount of future context thus allowing a future real-time implementation. According to a subjective listening test organized in this work, the proposed AC-VC system achieves parity with the non-causal ASR-TTS baseline of the Voice Conversion Challenge 2020 in naturalness with a MOS of 3.5. In contrast, the results indicate that missing future context impacts speaker similarity. Obtained similarity percentage of 65% is lower than the similarity of current best voice conversion systems.
Authors
(none)
Tags
Stats
Related papers
- Fastvc: Fast Voice Conversion With Non-parallel Data (2020)5.24
- Assem-vc: Realistic Voice Conversion By Assembling Modern Speech Synthesis Techniques (2021)11.64
- ACVAE-VC: Non-parallel Many-to-many Voice Conversion With Auxiliary Classifier Variational Autoencoder (2018)14.69
- Towards Natural And Controllable Cross-lingual Voice Conversion Based On Neural TTS Model And Phonetic Posteriorgram (2021)0.00
- Voice Conversion Using Sequence-to-sequence Learning Of Context Posterior Probabilities (2017)11.39
- CVC: Contrastive Learning For Non-parallel Voice Conversion (2020)7.50
- Building Multi Lingual TTS Using Cross Lingual Voice Conversion (2020)0.00
- Controlvc: Zero-shot Voice Conversion With Time-varying Controls On Pitch And Speed (2022)6.77