Abstract
Automatic Speech Recognition (ASR) systems perform well for high-resource languages but remain unreliable for regional dialects with limited data. The Makassar dialect, spoken by approximately nine million people in South Sulawesi, Indonesia, presents unique phonetic and grammatical challenges, including high-frequency particles such as mi, ji, and ko. This study introduces the first benchmark evaluation of state-of-the-art ASR models on Makassar speech. Four models, Whisper (tiny, base, small) and Wav2Vec2 Large XLSR Indonesian, were tested on 305 spontaneous utterances ($\sim 10$ minutes) from 10 native speakers. Results show severe performance degradation: the best model (Wav2Vec2 Indonesian) reached 87.73% of word error rate (12.27% accuracy). Error analysis reveals two dominant failure modes: Dialect Particle Blindness (average detection rate 2.9%) and Systematic Phonetic Mismatch (89 vowel confusions), indicating that current models treat dialectal features as noise. These findings underscore the urgent need for dialect-aware ASR adaptation and dataset development, providing a foundation for inclusive speech technology across Indonesia's linguistic diversity.