Tamil

Emerging

9papers using it

2021first seen

The 'Tamil' dataset/benchmark contains speech data used to evaluate automatic speech recognition (ASR) systems through fine-grained Part-of-Speech (PoS)-wise error analysis, particularly focusing on the alignment of ASR hypotheses and reference transcriptions in non-Latin scripts.

🔎 Find this dataset

Papers using Tamil (9)

MultiGen: Child-Friendly Multilingual Speech Generator with LLMs2025 · 3 cites

Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts2026

PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech2026

Goodness-of-pronunciation without phoneme time alignment2026

Dynamic Multi-Expert Projectors with Stabilized Routing for Multilingual Speech Recognition2026

Data and knowledge-driven approaches for multilingual training to improve the performance of speech recognition systems of Indian languages2022 · 3 cites

Intent Classification Using Pre-trained Language Agnostic Embeddings For Low Resource Languages2021

Textless NLP -- Zero Resource Challenge with Low Resource Compute2024

Multistage Fine-tuning Strategies for Automatic Speech Recognition in Low-resource Languages2024