Investigating Whisper's Prompt to Enhance Indian Accent Speech Recognition

Abstract

The state-of-the-art Whisper automatic speech recognition (ASR) model provides impressive speech transcription performance of samples recorded across different domains. The large amount of data used during training makes the model domain-invariant causing improved overall performance. However, Whisper still provides varying transcribing performance across different speech dialects and namedentities. Along with speech, Whisper model is also designed to receive textual prompts that guide the process of transcription. This paper investigates the use of Whisper's prompt to improve Indian accent English transcription performance. We analyse different prompt styles on the Indian English database and investigate how to maximise the effect of prompts on namedentities and accent speech transcriptions. We also show the effect of optimum prompt style on different variants Whisper model and show how performance improves by a large margin (around 15%). Our experiments are aimed towards improved understanding of the Whisper prompt and how to utilise it with maximum efficiency.

Abstract

Related papers