Automatic Speech Recognition in Healthcare in the Post-LLM Era: A Scoping Review Protocol

Abstract

Context: Automatic Speech Recognition (ASR) in healthcare is undergoing a significant shift driven by the integration of Large Language Models (LLMs). While traditional ASR focused on transcription fidelity, LLM-based systems extend this capability to intelligently reason, summarize, and structure clinical data. This scoping review maps the emerging landscape of LLM-based ASR in healthcare, examining its applications, technical foundations, evaluation practices, and reported challenges. Methods: Following PRISMA-ScR guidelines, we searched different databases for peer-reviewed, open-access studies published between January 2022 and December 2025 to ensure reproducibility and accessibility. Results: Nineteen studies met the inclusion criteria from 384 screened records. Administrative documentation was the most common application (42.1%), followed by diagnosis, therapy, and doctor–patient communication. Whisper dominated ASR (52.6%), typically paired with GPT-family or LLaMA-family LLMs in frozen configurations steered through prompting. LLMs served as the primary component in 68.4% of studies. ASR evaluation within the reviewed studies predominantly relied on word error rate, while LLM evaluation remains fragmented with no standard metric. Studies reported documentation time reductions of 30–90%, though privacy reporting was inconsistent, equity concerns were rarely tested systematically, and only five studies provided replication packages. Conclusions: LLM-based ASR shows potential for reducing documentation burden and supporting clinical workflows, but gaps in evaluation standardization, equity testing, and reproducibility must be addressed before safe clinical deployment.

Abstract

Related papers