Central Kurdish Text-to-speech Synthesis With Novel End-to-end Transformer Training
2024 Β· Hawraz A. Ahmad, Tarik A. Rashid
Abstract
Recent advancements in text-to-speech (TTS) models have aimed to streamline the two-stage process into a single-stage training approach. However, many single-stage models still lag behind in audio quality, particularly when handling Kurdish text and speech. There is a critical need to enhance text-to-speech conversion for the Kurdish language, particularly for the Sorani dialect, which has been relatively neglected and is underrepresented in recent text-to-speech advancements. This study introduces an end-to-end TTS model for efficiently generating high-quality Kurdish audio. The proposed method leverages a variational autoencoder (VAE) that is pre-trained for audio waveform reconstruction and is augmented by adversarial training. This involves aligning the prior distribution established by the pre-trained encoder with the posterior distribution of the text encoder within latent variables. Additionally, a stochastic duration predictor is incorporated to imbue synthesized Kurdish speech
Authors
(none)
Tags
Stats
Related papers
- Conditional Variational Autoencoder With Adversarial Learning For End-to-end Text-to-speech (2021)0.00
- End-to-end Adversarial Text-to-speech (2020)0.00
- VITS2: Improving Quality And Efficiency Of Single-stage Text-to-speech With Adversarial Learning And Architecture Design (2023)12.40
- Generative Adversarial Training For Text-to-speech Synthesis Based On Raw Phonetic Input And Explicit Prosody Modelling (2023)3.58
- Towards Transfer Learning For End-to-end Speech Synthesis From Deep Pre-trained Language Models (2019)0.00
- Speaker Diarization For Low-resource Languages Through Wav2vec Fine-tuning (2025)0.00
- Delightfultts 2: End-to-end Speech Synthesis With Adversarial Vector-quantized Auto-encoders (2022)9.23
- VAENAR-TTS: Variational Auto-encoder Based Non-autoregressive Text-to-speech Synthesis (2021)7.50