Learning Frame Similarity Using Siamese Networks For Audio-to-score Alignment
2020 Β· Ruchit Agrawal, Simon Dixon
Abstract
Audio-to-score alignment aims at generating an accurate mapping between a performance audio and the score of a given piece. Standard alignment methods are based on Dynamic Time Warping (DTW) and employ handcrafted features, which cannot be adapted to different acoustic conditions. We propose a method to overcome this limitation using learned frame similarity for audio-to-score alignment. We focus on offline audio-to-score alignment of piano music. Experiments on music data from different acoustic conditions demonstrate that our method achieves higher alignment accuracy than a standard DTW-based method that uses handcrafted features, and generates robust alignments whilst being adaptable to different domains at the same time.
Authors
(none)
Tags
Stats
Related papers
- Audio-to-score Alignment Of Piano Music Using Rnn-based Automatic Music Transcription (2017)0.00
- A Convolutional-attentional Neural Framework For Structure-aware Performance-score Synchronization (2022)6.34
- Structure-aware Audio-to-score Alignment Using Progressively Dilated Convolutional Neural Networks (2021)5.84
- Just Label The Repeats For In-the-wild Audio-to-score Alignment (2024)0.00
- Audio-to-score Alignment Using Deep Automatic Music Transcription (2021)0.00
- Audio-to-score Alignment Using Transposition-invariant Features (2018)0.00
- Unsupervised Feature Learning For Speech Using Correspondence And Siamese Networks (2020)8.09
- Multi-modal Conditional Bounding Box Regression For Music Score Following (2021)5.24