Awesome Speech Audio
πŸ“„Papers🧭TopicsπŸ‘₯AuthorsπŸ”₯TrendingπŸ—ΊοΈMapπŸ†LeaderboardsπŸ“šPacksπŸ› οΈToolsπŸ“BlogsπŸ€–Ask AIβœ‰οΈNewsletterπŸš€Pro
+ Add Paper

← all papers Β· overview

ZSV2C-MLLM: Zero-Shot Visual Voice Cloning Via Multimodal Large Language Models

Yanling ZhangΒ·Linqin WangΒ·Shengxiang GaoΒ·2026
Citations0GitHub0β˜…HF0
𝕏inβœ‰οΈ
arXiv:s2_b755d2dfd70a β†—Google Scholar β†—Semantic Scholar β†—
Voice Cloning

Abstract

(no abstract)

Related papers

  • X-Voice: Enabling Everyone to Speak 30 Languages via Zero-Shot Cross-Lingual Voice Cloning (2026)β€”
  • MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning (2026)β€”
  • LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models (2023)β€”
  • Multi-modal Adversarial Training for Zero-Shot Voice Cloning (2024)β€”
  • Low-Resource Multilingual and Zero-Shot Multispeaker TTS (2022)β€”
  • MimicLM: Zero-Shot Voice Imitation through Autoregressive Modeling of Pseudo-Parallel Speech Corpora (2026)β€”
  • OneVoice: One Model, Triple Scenarios-Towards Unified Zero-shot Voice Conversion (2026)β€”
  • The Codec Language Model-based Zero-Shot Spontaneous Style TTS System for CoVoC Challenge 2024 (2025)β€”

Stay Updated

E-Mail Digest

Submit a paper Β· Privacy Β· Terms

Β© 2026 Awesome Papers.