← all papers · overview

ZSV2C-MLLM: Zero-Shot Visual Voice Cloning Via Multimodal Large Language Models

Yanling Zhang·Linqin Wang·Shengxiang Gao·2026

Read paper ↗Google Scholar ↗Semantic Scholar ↗

Abstract

(no abstract)

Related papers

Ranked by semantic similarity — how closely each paper's abstract matches this one (100% = near-identical topic).

X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning (2026)72% match
MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning (2026)72% match
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models (2023)70% match
Multi-modal Adversarial Training for Zero-Shot Voice Cloning (2024)70% match
Low-Resource Multilingual and Zero-Shot Multispeaker TTS (2022)69% match
The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024 (2024)69% match
Improve few-shot voice cloning using multi-modal learning (2022)69% match
From A to B to A: Palindromic Zero-Shot Voice Conversion with Non-Parallel Data (2026)68% match