Period Singer: Integrating Periodic And Aperiodic Variational Autoencoders For Natural-sounding End-to-end Singing Voice Synthesis
2024 Β· Taewoo Kim, Choongsang Cho, Young Han Lee
Abstract
In this paper, we present Period Singer, a novel end-to-end singing voice synthesis (SVS) model that utilizes variational inference for periodic and aperiodic components, aimed at producing natural-sounding waveforms. Recent end-to-end SVS models have demonstrated the capability of synthesizing high-fidelity singing voices. However, owing to deterministic pitch conditioning, they do not fully address the one-to-many problem. To address this problem, we present the Period Singer architecture, which integrates variational autoencoders for the periodic and aperiodic components. Additionally, our methodology eliminates the dependency on an external aligner by estimating the phoneme alignment through a monotonic alignment search within note boundaries. Our empirical evaluations show that Period Singer outperforms existing end-to-end SVS models on Mandarin and Korean datasets. The efficacy of the proposed method was further corroborated by ablation studies.
Authors
(none)
Tags
Stats
Related papers
- Visinger: Variational Inference With Adversarial Learning For End-to-end Singing Voice Synthesis (2021)12.99
- Cssinger: End-to-end Chunkwise Streaming Singing Voice Synthesis System Based On Conditional Variational Autoencoder (2024)0.00
- Sifisinger: A High-fidelity End-to-end Singing Voice Synthesizer Based On Source-filter Model (2024)4.52
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Period VITS: Variational Inference With Explicit Pitch Modeling For End-to-end Emotional Speech Synthesis (2022)8.60
- Visinger 2: High-fidelity End-to-end Singing Voice Synthesis Enhanced By Digital Signal Processing Synthesizer (2022)0.00
- Towards Improving The Expressiveness Of Singing Voice Synthesis With BERT Derived Semantic Information (2023)0.00
- Bytesing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-decoder Acoustic Models And Wavernn Vocoders (2020)11.49