Transfer Learning For Improving Singing-voice Detection In Polyphonic Instrumental Music
2020 Β· Yuanbo Hou, Frank K. Soong, Jian Luan, et al.
Abstract
Detecting singing-voice in polyphonic instrumental music is critical to music information retrieval. To train a robust vocal detector, a large dataset marked with vocal or non-vocal label at frame-level is essential. However, frame-level labeling is time-consuming and labor expensive, resulting there is little well-labeled dataset available for singing-voice detection (S-VD). Hence, we propose a data augmentation method for S-VD by transfer learning. In this study, clean speech clips with voice activity endpoints and separate instrumental music clips are artificially added together to simulate polyphonic vocals to train a vocal/non-vocal detector. Due to the different articulation and phonation between speaking and singing, the vocal detector trained with the artificial dataset does not match well with the polyphonic music which is singing vocals together with the instrumental accompaniments. To reduce this mismatch, transfer learning is used to transfer the knowledge learned from the
Authors
(none)
Tags
Stats
Related papers
- Investigation Of Singing Voice Separation For Singing Voice Detection In Polyphonic Music (2020)5.84
- Jointly Detecting And Separating Singing Voice: A Multi-task Approach (2018)7.81
- Knowledge Distillation For Singing Voice Detection (2020)5.24
- Deep Audio-visual Singing Voice Transcription Based On Self-supervised Learning Models (2023)0.00
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Singaug: Data Augmentation For Singing Voice Synthesis With Cycle-consistent Training Strategy (2022)7.16
- Singing Voice Separation: A Study On Training Data (2019)10.07
- Acoustic Modeling For Automatic Lyrics-to-audio Alignment (2019)8.60