VCVTS: Multi-speaker Video-to-speech Synthesis Via Cross-modal Knowledge Transfer From Voice Conversion
2022 Β· Disong Wang, Shan Yang, Dan Su, et al.
Abstract
Though significant progress has been made for speaker-dependent Video-to-Speech (VTS) synthesis, little attention is devoted to multi-speaker VTS that can map silent video to speech, while allowing flexible control of speaker identity, all in a single system. This paper proposes a novel multi-speaker VTS system based on cross-modal knowledge transfer from voice conversion (VC), where vector quantization with contrastive predictive coding (VQCPC) is used for the content encoder of VC to derive discrete phoneme-like acoustic units, which are transferred to a Lip-to-Index (Lip2Ind) network to infer the index sequence of acoustic units. The Lip2Ind network can then substitute the content encoder of VC to form a multi-speaker VTS system to convert silent video to acoustic units for reconstructing accurate spoken content. The VTS system also inherits the advantages of VC by using a speaker encoder to produce speaker representations to effectively control the speaker identity of generated spe
Authors
(none)
Tags
Stats
Related papers
- Transfer Learning From Speech Synthesis To Voice Conversion With Non-parallel Training Data (2020)12.74
- AVQVC: One-shot Voice Conversion By Vector Quantization With Applying Contrastive Learning (2022)12.40
- VQMIVC: Vector Quantization And Mutual Information-based Unsupervised Speech Representation Disentanglement For One-shot Voice Conversion (2021)20.31
- Transfer Learning From Monolingual ASR To Transcription-free Cross-lingual Voice Conversion (2020)0.00
- Emotional Voice Conversion Using Multitask Learning With Text-to-speech (2019)0.00
- Assem-vc: Realistic Voice Conversion By Assembling Modern Speech Synthesis Techniques (2021)11.64
- SIG-VC: A Speaker Information Guided Zero-shot Voice Conversion System For Both Human Beings And Machines (2021)8.09
- VQVC+: One-shot Voice Conversion By Vector Quantization And U-net Architecture (2020)13.34