N-singer: A Non-autoregressive Korean Singing Voice Synthesis System For Pronunciation Enhancement
2021 Β· Gyeong-Hoon Lee, Tae-Woo Kim, Hanbin Bae, et al.
Abstract
Recently, end-to-end Korean singing voice systems have been designed to generate realistic singing voices. However, these systems still suffer from a lack of robustness in terms of pronunciation accuracy. In this paper, we propose N-Singer, a non-autoregressive Korean singing voice system, to synthesize accurate and pronounced Korean singing voices in parallel. N-Singer consists of a Transformer-based mel-generator, a convolutional network-based postnet, and voicing-aware discriminators. It can contribute in the following ways. First, for accurate pronunciation, N-Singer separately models linguistic and pitch information without other acoustic features. Second, to achieve improved mel-spectrograms, N-Singer uses a combination of Transformer-based modules and convolutional network-based modules. Third, in adversarial training, voicing-aware conditional discriminators are used to capture the harmonic features of voiced segments and noise components of unvoiced segments. The experimental
Authors
(none)
Tags
Stats
Related papers
- Adversarially Trained End-to-end Korean Singing Voice Synthesis System (2019)11.39
- Singgan: Generative Adversarial Network For High-fidelity Singing Voice Generation (2021)10.61
- NNSVS: A Neural Network-based Singing Voice Synthesis Toolkit (2022)13.83
- Sifisinger: A High-fidelity End-to-end Singing Voice Synthesizer Based On Source-filter Model (2024)4.52
- Period Singer: Integrating Periodic And Aperiodic Variational Autoencoders For Natural-sounding End-to-end Singing Voice Synthesis (2024)2.26
- Adversarially Trained Multi-singer Sequence-to-sequence Singing Synthesizer (2020)7.81
- Xiaoicesing 2: A High-fidelity Singing Voice Synthesizer Based On Generative Adversarial Network (2022)0.00
- Multi-singer: Fast Multi-singer Singing Voice Vocoder With A Large-scale Corpus (2021)13.28