Reconvat: A Semi-supervised Automatic Music Transcription Framework For Low-resource Real-world Data
2021 Β· Kin Wai Cheuk, Dorien Herremans, Li Su
Abstract
Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlabelled music recordings. The proposed ReconVAT uses reconstruction loss and virtual adversarial training. When combined with existing U-net models for AMT, ReconVAT achieves competitive results on common benchmark datasets such as MAPS and MusicNet. For example, in the few-shot setting for the string part version of MusicNet, ReconVAT achieves F1-scores of 61.0% and 41.6% for the note-wise and note-with-offset-wise metrics respectively, which translates into an improvement of 22.2% and 62.5% compared to the supervised baseline model. Our proposed framework also demonstrates the potential of con
Authors
(none)
Tags
Stats
Related papers
- Annotation-free Automatic Music Transcription With Scalable Synthetic Data And Adversarial Domain Confusion (2023)4.52
- Invariances And Data Augmentation For Supervised Music Transcription (2017)11.08
- Audio-to-score Alignment Of Piano Music Using Rnn-based Automatic Music Transcription (2017)0.00
- Yourmt3+: Multi-instrument Music Transcription With Enhanced Transformer Architectures And Cross-dataset Stem Augmentation (2024)11.84
- Audio-to-score Alignment Using Deep Automatic Music Transcription (2021)0.00
- Adversarial Learning For Improved Onsets And Frames Music Transcription (2019)0.00
- Deep Audio-visual Singing Voice Transcription Based On Self-supervised Learning Models (2023)0.00
- Mstre-net: Multistreaming Acoustic Modeling For Automatic Lyrics Transcription (2021)0.00