Multi-modal Conditional Bounding Box Regression For Music Score Following
2021 Β· Florian Henkel, Gerhard Widmer
Abstract
This paper addresses the problem of sheet-image-based on-line audio-to-score alignment also known as score following. Drawing inspiration from object detection, a conditional neural network architecture is proposed that directly predicts x,y coordinates of the matching positions in a complete score sheet image at each point in time for a given musical performance. Experiments are conducted on a synthetic polyphonic piano benchmark dataset and the new method is compared to several existing approaches from the literature for sheet-image-based score following as well as an Optical Music Recognition baseline. The proposed approach achieves new state-of-the-art results and furthermore significantly improves the alignment performance on a set of real-world piano recordings by applying Impulse Responses as a data augmentation technique.
Authors
(none)
Tags
Stats
Related papers
- Audio-to-score Alignment Of Piano Music Using Rnn-based Automatic Music Transcription (2017)0.00
- A Convolutional-attentional Neural Framework For Structure-aware Performance-score Synchronization (2022)6.34
- Just Label The Repeats For In-the-wild Audio-to-score Alignment (2024)0.00
- Structure-aware Audio-to-score Alignment Using Progressively Dilated Convolutional Neural Networks (2021)5.84
- Coupled Recurrent Models For Polyphonic Music Composition (2018)0.00
- A Holistic Approach To Polyphonic Music Transcription With Neural Networks (2019)0.00
- Learning Frame Similarity Using Siamese Networks For Audio-to-score Alignment (2020)8.09
- Audio-to-score Alignment Using Deep Automatic Music Transcription (2021)0.00