← all datasets

LJSpeech

Canonical

37papers using it

2021first seen

This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. A transcription is provided for each clip. Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours.

🔎 Find this dataset

Papers using LJSpeech (37)

Lightweight End-to-end Text-to-speech Synthesis for low resource on-device applications2025 · 7 cites

Adaptive Oscillatory Inductive Bias for Modeling Sharp Prosodic Dynamics in Diffusion-Based TTS2026

Clip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality Text-to-Speech Method based on Contextual Semantic Understanding2025 · 2 cites

MambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion Control2026

Beyond Two-stage Diffusion TTS: Joint Structure and Content Refinement via Jump Diffusion2026

Assessing the Ability of Neural TTS Systems to Model Consonant-Induced F0 Perturbation2026

ECTSpeech: Enhancing Efficient Speech Synthesis via Easy Consistency Tuning2025

FNH-TTS: Mixture-of-Experts Duration Modeling for Robust Neural Speech Synthesis2025

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality2022 · 35 cites

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models2023 · 23 cites

Guided-TTS: A Diffusion Model for Text-to-Speech via Classifier Guidance2021 · 20 cites

DiffVoice: Text-to-Speech with Latent Diffusion2023 · 18 cites

Unsupervised word-level prosody tagging for controllable speech synthesis2022 · 10 cites

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech2022 · 5 cites

Enhancing Gappy Speech Audio Signals with Generative Adversarial Networks2023 · 3 cites

Representative Subset Selection for Efficient Fine-Tuning in Self-Supervised Speech Recognition2022 · 2 cites

Low-Resource Text-to-Speech Using Specific Data and Noise Augmentation2023 · 2 cites

Energy-Based Models For Speech Synthesis2023 · 2 cites

Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis2023 · 2 cites

Federated Learning with Dynamic Transformer for Text to Speech2021 · 1 cites

JETS: Jointly Training FastSpeech2 and HiFi-GAN for End to End Text to Speech2022 · 1 cites

U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech2023 · 1 cites

Rep2wav: Noise Robust text-to-speech Using self-supervised representations2023 · 1 cites

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform2023 · 1 cites

IndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech Corpus for Scaling Indian TTS2024 · 1 cites

Continual Learning in Machine Speech Chain Using Gradient Episodic Memory2024 · 1 cites

Mixer-TTS: non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings2021

SOMOS: The Samsung Open MOS Dataset for the Evaluation of Neural Text-to-Speech Synthesis2022

Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech2022

Improved Consistency Training for Semi-Supervised Sequence-to-Sequence ASR via Speech Chain Reconstruction and Self-Transcribing2022

Unsupervised ASR via Cross-Lingual Pseudo-Labeling2023

Arabic Dysarthric Speech Recognition Using Adversarial and Signal-Based Augmentation2023

ReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-Speech2023

AttentionStitch: How Attention Solves the Speech Editing Problem2024

Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness2024

Sequence-to-sequence models in peer-to-peer learning: A practical application2024

PRESENT: Zero-Shot Text-to-Prosody Control2024

LJSpeech dataset — papers, benchmarks & downloads · Speech Audio