Mandarin
Emerging21papers using it
2022first seen
The 'Mandarin' dataset/benchmark contains speech data from various Mandarin dialects and is used to evaluate a unified cross-dialect speech recognition framework based on a tonal Pinyin intermediate representation.
Papers using Mandarin (20)
- Analyzing Acoustic Word Embeddings From Pre-trained Self-supervised Speech ModelsPeriod Singer: Integrating Periodic And Aperiodic Variational Autoencoders For Natural-sounding End-to-end Singing Voice SynthesisToward Unified Chinese Multi-Dialectal Speech Recognition via Pinyin Intermediate RepresentationProsodic ABX: A Language-Agnostic Method for Measuring Prosodic Contrast in Speech RepresentationsRethinking Entropy Allocation in LLM-based ASR: Understanding the Dynamics between Speech Encoders and LLMsSITA: Learning Speaker-Invariant and Tone-Aware Speech Representations for Low-Resource Tonal LanguagesUnsupervised lexicon learning from speech is limited by representations rather than clusteringHENT-SRT: Hierarchical Efficient Neural Transducer with Self-Distillation for Joint Speech Recognition and TranslationA Self-Refining Framework for Enhancing ASR Using TTS-Synthesized DataLa-rag:enhancing Llm-based ASR Accuracy With Retrieval-augmented GenerationDo Discrete Self-supervised Representations Of Speech Capture Tone Distinctions?Text Enhancement For Paragraph Processing In End-to-end Code-switching TTSAnalyzing Acoustic Word Embeddings from Pre-trained Self-supervised
Speech ModelsMulti-pass Training and Cross-information Fusion for Low-resource
End-to-end Accented Speech RecognitionEffects of Convolutional Autoencoder Bottleneck Width on StarGAN-based
Singing Technique ConversionAccent-VITS:accent transfer for end-to-end TTSPeriod Singer: Integrating Periodic and Aperiodic Variational
Autoencoders for Natural-Sounding End-to-End Singing Voice SynthesisPRESENT: Zero-Shot Text-to-Prosody ControlLA-RAG:Enhancing LLM-based ASR Accuracy with Retrieval-Augmented
GenerationDo Discrete Self-Supervised Representations of Speech Capture Tone
Distinctions?