Songtrans: An Unified Song Transcription And Alignment Method For Lyrics And Notes

Abstract

The quantity of processed data is crucial for advancing the field of singing voice synthesis. While there are tools available for lyric or note transcription tasks, they all need pre-processed data which is relatively time-consuming (e.g., vocal and accompaniment separation). Besides, most of these tools are designed to address a single task and struggle with aligning lyrics and notes (i.e., identifying the corresponding notes of each word in lyrics). To address those challenges, we first design a pipeline by optimizing existing tools and annotating numerous lyric-note pairs of songs. Then, based on the annotated data, we train a unified SongTrans model that can directly transcribe lyrics and notes while aligning them simultaneously, without requiring pre-processing songs. Our SongTrans model consists of two modules: (1) the \textbf\{Autoregressive module\} predicts the lyrics, along with the duration and note number corresponding to each word in a lyric. (2) the \textbf\{Non-autoregre

Songtrans: An Unified Song Transcription And Alignment Method For Lyrics And Notes

Abstract

Authors

Tags

Stats

Related papers