Abstract

Indigenous languages are a fundamental legacy in the development of human communication, embodying the unique identity and culture of local communities in America. The Second AmericasNLP (Americas Natural Language Processing) Competition Track 1 of NeurIPS (Neural Information Processing Systems) 2022 proposed the task of training automatic speech recognition (ASR) systems for five Indigenous languages: Quechua, Guarani, Bribri, Kotiria, and Wa'ikhana. In this paper, we describe the fine-tuning of a state-of-the-art ASR model for each target language, using approximately 36.65 h of transcribed speech data from diverse sources enriched with data augmentation methods. We systematically investigate, using a Bayesian search, the impact of the different hyperparameters on the Wav2vec2.0 XLS-R (Cross-Lingual Speech Representations) variants of 300 M and 1 B parameters. Our findings indicate that data and detailed hyperparameter tuning significantly affect ASR accuracy, but language complexity

Authors

(none)

Tags

  • Speech Recognition
  • Speech Translation

Stats

Related papers