Singomd: Singing Oriented Multi-resolution Discrete Representation Construction From Speech Models
2024 Β· Yuxun Tang, Yuning Wu, Jiatong Shi, et al.
Abstract
Discrete representation has shown advantages in speech generation tasks, wherein discrete tokens are derived by discretizing hidden features from self-supervised learning (SSL) pre-trained models. However, the direct application of speech SSL models to singing generation encounters domain gaps between speech and singing. Furthermore, singing generation necessitates a more refined representation than typical speech. To address these challenges, we introduce SingOMD, a novel method to extract singing-oriented multi-resolution discrete representations from speech SSL models. Specifically, we first adapt the features from speech SSL through a resynthesis task and incorporate multi-resolution modules based on resampling to better serve singing generation. These adapted multi-resolution features are then discretized via clustering. Extensive experiments demonstrate the robustness, efficiency, and effectiveness of these representations in singing vocoders and singing voice synthesis.
Authors
(none)
Tags
Stats
Related papers
- MMM: Multi-layer Multi-residual Multi-stream Discrete Speech Representation From Self-supervised Learning Model (2024)6.77
- Hierarchical Disentangled Representation Learning For Singing Voice Conversion (2021)6.34
- Singgan: Generative Adversarial Network For High-fidelity Singing Voice Generation (2021)10.61
- Ddsp-based Singing Vocoders: A New Subtractive-based Synthesizer And A Comprehensive Evaluation (2022)0.00
- Everyone-can-sing: Zero-shot Singing Voice Synthesis And Conversion With Speech Reference (2025)0.00
- Adversarially Trained Multi-singer Sequence-to-sequence Singing Synthesizer (2020)7.81
- Singmos: An Extensive Open-source Singing Voice Dataset For MOS Prediction (2024)0.00
- Comelsinger: Discrete Token-based Zero-shot Singing Synthesis With Structured Melody Control And Guidance (2025)0.00