English
Emerging37papers using it
2022first seen
The 'English' dataset/benchmark contains data used for multilingual training in text-to-speech systems and is utilized to evaluate cross-lingual speaker transfer methods for anonymizing speaker identities.
Papers using English (37)
- Analyzing Acoustic Word Embeddings From Pre-trained Self-supervised Speech ModelsSyllable Discovery And Cross-lingual Generalization In A Visually Grounded, Self-supervised Speech ModelLearning Multilingual Expressive Speech Representation For Prosody Prediction Without Parallel DataLLM-based Generative Error Correction for Rare Words with Synthetic Data and Phonetic ContextBreaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin ScriptsAccent-Invariant Automatic Speech Recognition via Saliency-Driven Spectrogram MaskingWord stress in self-supervised speech models: A cross-linguistic comparisonExploring Cross-Lingual Voice Conversion Methods for Anonymizing Low-Resource Text-to-SpeechRethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMsAn Ultra-Low Latency, End-to-End Streaming Speech Synthesis Architecture via Block-Wise Generation and Depth-Wise Codec DecodingUtterance-Level Methods for Identifying Reliable ASR-Output for Child Speechfindsylls: A Language-Agnostic Toolkit for Syllable-Level Speech Tokenization and EmbeddingE2E-VGuard: Adversarial Prevention for Production LLM-based End-To-End Speech SynthesisUnsupervised lexicon learning from speech is limited by representations rather than clusteringParallel GPT: Harmonizing the Independence and Interdependence of Acoustic and Semantic Information for Zero-Shot Text-to-SpeechEvaluating Standard And Dialectal Frisian ASR: Multilingual Fine-tuning And Language Identification For Improved Low-resource PerformanceSpeechdialoguefactory: Generating High-quality Speech Dialogue Data To Accelerate Your Speech-llm DevelopmentSpeechDialogueFactory: Generating High-Quality Speech Dialogue Data to
Accelerate Your Speech-LLM DevelopmentGrad-stylespeech: Any-speaker Adaptive Text-to-speech Synthesis With Diffusion ModelsParaformer-v2: An Improved Non-autoregressive Transformer For Noise-robust Speech RecognitionVisually Grounded Keyword Detection And Localisation For Low-resource LanguagesText Enhancement For Paragraph Processing In End-to-end Code-switching TTSLeveraging supplementary text data to kick-start automatic speech
recognition system development with limited transcriptionsAnalyzing Acoustic Word Embeddings from Pre-trained Self-supervised
Speech ModelsTTS-Guided Training for Accent Conversion Without Parallel DataSyllable Discovery and Cross-Lingual Generalization in a Visually
Grounded, Self-Supervised Speech ModelParaformer-v2: An improved non-autoregressive transformer for
noise-robust speech recognitionAdvocating Character Error Rate for Multilingual ASR EvaluationSTTATTS: Unified Speech-To-Text And Text-To-Speech ModelVisually Grounded Keyword Detection and Localisation for Low-Resource
LanguagesLearning Multilingual Expressive Speech Representation for Prosody
Prediction without Parallel DataZero Resource Cross-Lingual Part Of Speech TaggingContextualized Automatic Speech Recognition with Dynamic VocabularyVECL-TTS: Voice identity and Emotional style controllable Cross-Lingual
Text-to-SpeechEnhancing Polyglot Voices by Leveraging Cross-Lingual Fine-Tuning in
Any-to-One Voice ConversionTextless NLP -- Zero Resource Challenge with Low Resource ComputeLightGrad: Lightweight Diffusion Probabilistic Model for Text-to-Speech