Lyrics-to-audio Alignment By Unsupervised Discovery Of Repetitive Patterns In Vowel Acoustics
2017 Β· Sungkyun Chang, Kyogu Lee
Abstract
Most of the previous approaches to lyrics-to-audio alignment used a pre-developed automatic speech recognition (ASR) system that innately suffered from several difficulties to adapt the speech model to individual singers. A significant aspect missing in previous works is the self-learnability of repetitive vowel patterns in the singing voice, where the vowel part used is more consistent than the consonant part. Based on this, our system first learns a discriminative subspace of vowel sequences, based on weighted symmetric non-negative matrix factorization (WS-NMF), by taking the self-similarity of a standard acoustic feature as an input. Then, we make use of canonical time warping (CTW), derived from a recent computer vision technique, to find an optimal spatiotemporal transformation between the text and the acoustic sequences. Experiments with Korean and English data sets showed that deploying this method after a pre-developed, unsupervised, singing source separation achieved more pro
Authors
(none)
Tags
Stats
Related papers
- End-to-end Lyrics Alignment For Polyphonic Music Using An Audio-to-character Recognition Model (2019)13.11
- Acoustic Modeling For Automatic Lyrics-to-audio Alignment (2019)8.60
- Contrastive Learning-based Audio To Lyrics Alignment For Multiple Languages (2023)6.77
- HCLAS-X: Hierarchical And Cascaded Lyrics Alignment System Using Multimodal Cross-correlation (2023)0.00
- Content Based Singing Voice Source Separation Via Strong Conditioning Using Aligned Phonemes (2020)0.00
- A Real-time Lyrics Alignment System Using Chroma And Phonetic Features For Classical Vocal Performance (2024)2.26
- Songtrans: An Unified Song Transcription And Alignment Method For Lyrics And Notes (2024)0.00
- TIPAA-SSL: Text Independent Phone-to-audio Alignment Based On Self-supervised Learning And Knowledge Transfer (2024)0.00