Transcribing Lyrics From Commercial Song Audio: The First Step Towards Singing Content Processing
2018 Β· Che-Ping Tsai, Yi-Lin Tuan, Lin-Shan Lee
Abstract
Spoken content processing (such as retrieval and browsing) is maturing, but the singing content is still almost completely left out. Songs are human voice carrying plenty of semantic information just as speech, and may be considered as a special type of speech with highly flexible prosody. The various problems in song audio, for example the significantly changing phone duration over highly flexible pitch contours, make the recognition of lyrics from song audio much more difficult. This paper reports an initial attempt towards this goal. We collected music-removed version of English songs directly from commercial singing content. The best results were obtained by TDNN-LSTM with data augmentation with 3-fold speed perturbation plus some special approaches. The WER achieved (73.90%) was significantly lower than the baseline (96.21%), but still relatively high.
Authors
(none)
Tags
Stats
Related papers
- Songtrans: An Unified Song Transcription And Alignment Method For Lyrics And Notes (2024)0.00
- Deep Audio-visual Singing Voice Transcription Based On Self-supervised Learning Models (2023)0.00
- Songprep: A Preprocessing Framework And End-to-end Model For Full-song Structure Parsing And Lyrics Transcription (2025)0.00
- Automatic Lyrics Transcription Using Dilated Convolutional Neural Networks With Self-attention (2020)10.07
- Lyrics-to-audio Alignment By Unsupervised Discovery Of Repetitive Patterns In Vowel Acoustics (2017)6.34
- Pdaugment: Data Augmentation By Pitch And Duration Adjustments For Automatic Lyrics Transcription (2021)0.00
- End-to-end Lyrics Alignment For Polyphonic Music Using An Audio-to-character Recognition Model (2019)13.11
- Exploiting Synchronized Lyrics And Vocal Features For Music Emotion Detection (2019)0.00