Learn To Sing By Listening: Building Controllable Virtual Singer By Unsupervised Learning From Voice Recordings
2023 Β· Wei Xue, Yiwen Wang, Qifeng Liu, et al.
Abstract
The virtual world is being established in which digital humans are created indistinguishable from real humans. Producing their audio-related capabilities is crucial since voice conveys extensive personal characteristics. We aim to create a controllable audio-form virtual singer; however, supervised modeling and controlling all different factors of the singing voice, such as timbre, tempo, pitch, and lyrics, is extremely difficult since accurately labeling all such information needs enormous labor work. In this paper, we propose a framework that could digitize a person's voice by simply "listening" to the clean voice recordings of any content in a fully unsupervised manner and predict singing voices even only using speaking recordings. A variational auto-encoder (VAE) based framework is developed, which leverages a set of pre-trained models to encode the audio as various hidden embeddings representing different factors of the singing voice, and further decodes the embeddings into raw au
Authors
(none)
Tags
Stats
Related papers
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Deep Audio-visual Singing Voice Transcription Based On Self-supervised Learning Models (2023)0.00
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- Vevo2: A Unified And Controllable Framework For Speech And Singing Voice Generation (2025)0.00
- Cssinger: End-to-end Chunkwise Streaming Singing Voice Synthesis System Based On Conditional Variational Autoencoder (2024)0.00
- Semi-supervised Learning For Singing Synthesis Timbre (2020)3.58
- Deep Encoder-decoder Models For Unsupervised Learning Of Controllable Speech Synthesis (2018)0.00
- Learning And Controlling The Source-filter Representation Of Speech With A Variational Autoencoder (2022)7.50