LJSpeech
Canonical34papers using it
2022first seen
Papers using LJSpeech (34)
- SOMOS: The Samsung Open MOS Dataset For The Evaluation Of Neural Text-to-speech SynthesisStyletts 2: Towards Human-level Text-to-speech Through Style Diffusion And Adversarial Training With Large Speech Language ModelsReflow-tts: A Rectified Flow Model For High-fidelity Text-to-speechUnsupervised Word-level Prosody Tagging For Controllable Speech SynthesisLightweight End-to-end Text-to-speech Synthesis for low resource on-device applicationsClip-TTS: Contrastive Text-content and Mel-spectrogram, A High-Quality
Text-to-Speech Method based on Contextual Semantic UnderstandingContinual Learning In Machine Speech Chain Using Gradient Episodic MemoryPRESENT: Zero-shot Text-to-prosody ControlIndicvoices-r: Unlocking A Massive Multilingual Multi-speaker Speech Corpus For Scaling Indian TTSMambaVoiceCloning: Efficient and Expressive Text-to-Speech via State-Space Modeling and Diffusion ControlBeyond Two-stage Diffusion TTS: Joint Structure and Content Refinement via Jump DiffusionECTSpeech: Enhancing Efficient Speech Synthesis via Easy Consistency TuningFNH-TTS: A Fast, Natural, and Human-Like Speech Synthesis System with advanced prosodic modeling based on Mixture of ExpertsHiftnet: A Fast High-quality Neural Vocoder With Harmonic-plus-noise Filter And Inverse Short Time Fourier TransformSequence-to-sequence Models In Peer-to-peer Learning: A Practical ApplicationStyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion
and Adversarial Training with Large Speech Language ModelsResGrad: Residual Denoising Diffusion Probabilistic Models for Text to
SpeechLow-Resource Text-to-Speech Using Specific Data and Noise AugmentationEnergy-Based Models For Speech SynthesisSchrodinger Bridges Beat Diffusion Models on Text-to-Speech SynthesisU-DiT TTS: U-Diffusion Vision Transformer for Text-to-SpeechRep2wav: Noise Robust text-to-speech Using self-supervised
representationsHiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise
Filter and Inverse Short Time Fourier TransformContinual Learning in Machine Speech Chain Using Gradient Episodic
MemoryDiffVoice: Text-to-Speech with Latent DiffusionEnhancing Gappy Speech Audio Signals with Generative Adversarial
NetworksUnsupervised ASR via Cross-Lingual Pseudo-LabelingArabic Dysarthric Speech Recognition Using Adversarial and Signal-Based
AugmentationReFlow-TTS: A Rectified Flow Model for High-fidelity Text-to-SpeechAttentionStitch: How Attention Solves the Speech Editing ProblemLlama-VITS: Enhancing TTS Synthesis with Semantic AwarenessSequence-to-sequence models in peer-to-peer learning: A practical
applicationPRESENT: Zero-Shot Text-to-Prosody ControlIndicVoices-R: Unlocking a Massive Multilingual Multi-speaker Speech
Corpus for Scaling Indian TTS