Lyricwhiz: Robust Multilingual Zero-shot Lyrics Transcription By Whispering To Chatgpt
2023 Β· Le Zhuo, Ruibin Yuan, Jiahao Pan, et al.
Abstract
We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic lyrics transcription method achieving state-of-the-art performance on various lyrics transcription datasets, even in challenging genres such as rock and metal. Our novel, training-free approach utilizes Whisper, a weakly supervised robust speech recognition model, and GPT-4, today's most performant chat-based large language model. In the proposed method, Whisper functions as the "ear" by transcribing the audio, while GPT-4 serves as the "brain," acting as an annotator with a strong performance for contextualized output selection and correction. Our experiments show that LyricWhiz significantly reduces Word Error Rate compared to existing methods in English and can effectively transcribe lyrics across multiple languages. Furthermore, we use LyricWhiz to create the first publicly available, large-scale, multilingual lyrics transcription dataset with a CC-BY-NC-SA copyright license, based on MTG-Jamendo, and offer a h
Authors
(none)
Tags
Stats
Related papers
- Adapting Pretrained Speech Model For Mandarin Lyrics Transcription And Alignment (2023)3.58
- Liwhiz: A Non-intrusive Lyric Intelligibility Prediction System For The Cadenza Challenge (2025)2.26
- Leveraging Whisper Embeddings For Audio-based Lyrics Matching (2025)0.00
- Songtrans: An Unified Song Transcription And Alignment Method For Lyrics And Notes (2024)0.00
- Contrastive Learning-based Audio To Lyrics Alignment For Multiple Languages (2023)6.77
- A Study On Zero-shot Non-intrusive Speech Assessment Using Large Language Models (2024)5.84
- Towards Building An End-to-end Multilingual Automatic Lyrics Transcription Model (2024)0.00
- End-to-end Lyrics Alignment For Polyphonic Music Using An Audio-to-character Recognition Model (2019)13.11