Knowledge Distillation For Singing Voice Detection
2020 Β· Soumava Paul, Gurunath Reddy M, K Sreenivasa Rao, et al.
Abstract
Singing Voice Detection (SVD) has been an active area of research in music information retrieval (MIR). Currently, two deep neural network-based methods, one based on CNN and the other on RNN, exist in literature that learn optimized features for the voice detection (VD) task and achieve state-of-the-art performance on common datasets. Both these models have a huge number of parameters (1.4M for CNN and 65.7K for RNN) and hence not suitable for deployment on devices like smartphones or embedded sensors with limited capacity in terms of memory and computation power. The most popular method to address this issue is known as knowledge distillation in deep learning literature (in addition to model compression) where a large pre-trained network known as the teacher is used to train a smaller student network. Given the wide applications of SVD in music information retrieval, to the best of our knowledge, model compression for practical deployment has not yet been explored. In this paper, eff
Authors
(none)
Tags
Stats
Related papers
- Investigation Of Singing Voice Separation For Singing Voice Detection In Polyphonic Music (2020)5.84
- Transfer Learning For Improving Singing-voice Detection In Polyphonic Instrumental Music (2020)7.16
- Jointly Detecting And Separating Singing Voice: A Multi-task Approach (2018)7.81
- Primadnn': A Characteristics-aware DNN Customization For Singing Technique Detection (2023)0.00
- Visinger2+: End-to-end Singing Voice Synthesis Augmented By Self-supervised Learning Representation (2024)4.52
- Diffsinger: Singing Voice Synthesis Via Shallow Diffusion Mechanism (2021)23.76
- Data Efficient Voice Cloning For Neural Singing Synthesis (2019)10.07
- Fast And High-quality Singing Voice Synthesis System Based On Convolutional Neural Networks (2019)8.82