Songtrans: An Unified Song Transcription And Alignment Method For Lyrics And Notes
2024 Β· Siwei Wu, Jinzheng He, Ruibin Yuan, et al.
Abstract
The quantity of processed data is crucial for advancing the field of singing voice synthesis. While there are tools available for lyric or note transcription tasks, they all need pre-processed data which is relatively time-consuming (e.g., vocal and accompaniment separation). Besides, most of these tools are designed to address a single task and struggle with aligning lyrics and notes (i.e., identifying the corresponding notes of each word in lyrics). To address those challenges, we first design a pipeline by optimizing existing tools and annotating numerous lyric-note pairs of songs. Then, based on the annotated data, we train a unified SongTrans model that can directly transcribe lyrics and notes while aligning them simultaneously, without requiring pre-processing songs. Our SongTrans model consists of two modules: (1) the \textbf\{Autoregressive module\} predicts the lyrics, along with the duration and note number corresponding to each word in a lyric. (2) the \textbf\{Non-autoregre
Authors
(none)
Tags
Stats
Related papers
- Translate The Beauty In Songs: Jointly Learning To Align Melody And Translate Lyrics (2023)3.58
- Songprep: A Preprocessing Framework And End-to-end Model For Full-song Structure Parsing And Lyrics Transcription (2025)0.00
- End-to-end Lyrics Alignment For Polyphonic Music Using An Audio-to-character Recognition Model (2019)13.11
- Deep Audio-visual Singing Voice Transcription Based On Self-supervised Learning Models (2023)0.00
- Note-level Singing Melody Transcription For Time-aligned Musical Score Generation (2025)5.24
- Lyrics-to-audio Alignment By Unsupervised Discovery Of Repetitive Patterns In Vowel Acoustics (2017)6.34
- Songglm: Lyric-to-melody Generation With 2D Alignment Encoding And Multi-task Pre-training (2024)3.58
- Songgen: A Single Stage Auto-regressive Transformer For Text-to-song Generation (2025)4.98