Sifisinger: A High-fidelity End-to-end Singing Voice Synthesizer Based On Source-filter Model
2024 Β· Jianwei Cui, Yu Gu, Chao Weng, et al.
Abstract
This paper presents an advanced end-to-end singing voice synthesis (SVS) system based on the source-filter mechanism that directly translates lyrical and melodic cues into expressive and high-fidelity human-like singing. Similarly to VISinger 2, the proposed system also utilizes training paradigms evolved from VITS and incorporates elements like the fundamental pitch (F0) predictor and waveform generation decoder. To address the issue that the coupling of mel-spectrogram features with F0 information may introduce errors during F0 prediction, we consider two strategies. Firstly, we leverage mel-cepstrum (mcep) features to decouple the intertwined mel-spectrogram and F0 characteristics. Secondly, inspired by the neural source-filter models, we introduce source excitation signals as the representation of F0 in the SVS system, aiming to capture pitch nuances more accurately. Meanwhile, differentiable mcep and F0 losses are employed as the waveform decoder supervision to fortify the predict
Authors
(none)
Tags
Stats
Related papers
- Visinger: Variational Inference With Adversarial Learning For End-to-end Singing Voice Synthesis (2021)12.99
- Visinger 2: High-fidelity End-to-end Singing Voice Synthesis Enhanced By Digital Signal Processing Synthesizer (2022)0.00
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Diffsinger: Singing Voice Synthesis Via Shallow Diffusion Mechanism (2021)23.76
- Towards Improving The Expressiveness Of Singing Voice Synthesis With BERT Derived Semantic Information (2023)0.00
- Cssinger: End-to-end Chunkwise Streaming Singing Voice Synthesis System Based On Conditional Variational Autoencoder (2024)0.00
- Period Singer: Integrating Periodic And Aperiodic Variational Autoencoders For Natural-sounding End-to-end Singing Voice Synthesis (2024)2.26
- Consinger: Efficient High-fidelity Singing Voice Generation With Minimal Steps (2024)2.26