Optimizing Voice Conversion Network With Cycle Consistency Loss Of Speaker Identity
2020 Β· Hongqiang Du, Xiaohai Tian, Lei Xie, et al.
Abstract
We propose a novel training scheme to optimize voice conversion network with a speaker identity loss function. The training scheme not only minimizes frame-level spectral loss, but also speaker identity loss. We introduce a cycle consistency loss that constrains the converted speech to maintain the same speaker identity as reference speech at utterance level. While the proposed training scheme is applicable to any voice conversion networks, we formulate the study under the average model voice conversion framework in this paper. Experiments conducted on CMU-ARCTIC and CSTR-VCTK corpus confirm that the proposed method outperforms baseline methods in terms of speaker similarity.
Authors
(none)
Tags
Stats
Related papers
- Learning In Your Voice: Non-parallel Voice Conversion Based On Speaker Consistency Loss (2020)0.00
- Autocycle-vc: Towards Bottleneck-independent Zero-shot Cross-lingual Voice Conversion (2023)0.00
- Enhanced Exemplar Autoencoder With Cycle Consistency Loss In Any-to-one Voice Conversion (2022)0.00
- Residual Speaker Representation For One-shot Voice Conversion (2023)0.00
- Parallel-data-free Voice Conversion Using Cycle-consistent Adversarial Networks (2017)0.00
- Many-to-many Voice Conversion Using Cycle-consistent Variational Autoencoder With Multiple Decoders (2019)6.34
- Multi-target Voice Conversion Without Parallel Data By Adversarially Learning Disentangled Audio Representations (2018)13.60
- Using Joint Training Speaker Encoder With Consistency Loss To Achieve Cross-lingual Voice Conversion And Expressive Voice Conversion (2023)0.00