Polysinger: Singing-voice To Singing-voice Translation From English To Japanese
2024 · Silas Antonisen, Iván López-Espejo
Abstract
The speech domain prevails in the spotlight for several natural language processing (NLP) tasks while the singing domain remains less explored. The culmination of NLP is the speech-to-speech translation (S2ST) task, referring to translation and synthesis of human speech. A disparity between S2ST and the possible adaptation to the singing domain, which we describe as singing-voice to singing-voice translation (SV2SVT), is becoming prominent as the former is progressing ever faster, while the latter is at a standstill. Singing-voice synthesis systems are overcoming the barrier of multi-lingual synthesis, despite limited attention has been paid to multi-lingual songwriting and song translation. This paper endeavors to determine what is required for successful SV2SVT and proposes PolySinger (Polyglot Singer): the first system for SV2SVT, performing lyrics translation from English to Japanese. A cascaded approach is proposed to establish a framework with a high degree of control which can p
Authors
(none)
Tags
Stats
Related papers
- Bisinger: Bilingual Singing Voice Synthesis (2023)2.26
- Prompt-singer: Controllable Singing-voice-synthesis With Natural Language Prompt (2024)6.77
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- N-singer: A Non-autoregressive Korean Singing Voice Synthesis System For Pronunciation Enhancement (2021)8.60
- Techsinger: Technique Controllable Multilingual Singing Voice Synthesis Via Flow Matching (2025)7.81
- Sifisinger: A High-fidelity End-to-end Singing Voice Synthesizer Based On Source-filter Model (2024)4.52
- Tcsinger: Zero-shot Singing Voice Synthesis With Style Transfer And Multi-level Style Control (2024)7.16
- Unisyn: An End-to-end Unified Model For Text-to-speech And Singing Voice Synthesis (2022)0.00