Towards Improving Harmonic Sensitivity And Prediction Stability For Singing Melody Extraction
2023 Β· Keren Shao, Ke Chen, Taylor Berg-Kirkpatrick, et al.
Abstract
In deep learning research, many melody extraction models rely on redesigning neural network architectures to improve performance. In this paper, we propose an input feature modification and a training objective modification based on two assumptions. First, harmonics in the spectrograms of audio data decay rapidly along the frequency axis. To enhance the model's sensitivity on the trailing harmonics, we modify the Combined Frequency and Periodicity (CFP) representation using discrete z-transform. Second, the vocal and non-vocal segments with extremely short duration are uncommon. To ensure a more stable melody contour, we design a differentiable loss function that prevents the model from predicting such segments. We apply these modifications to several models, including MSNet, FTANet, and a newly introduced model, PianoNet, modified from a piano transcription network. Our experimental results demonstrate that the proposed modifications are empirically effective for singing melody extrac
Authors
(none)
Tags
Stats
Related papers
- Tonet: Tone-octave Network For Singing Melody Extraction From Polyphonic Music (2022)9.76
- A Streamlined Encoder/decoder Architecture For Melody Extraction (2018)12.68
- Analysing Deep Learning-spectral Envelope Prediction Methods For Singing Synthesis (2019)4.52
- A Melody-unsupervision Model For Singing Voice Synthesis (2021)5.84
- Adversarial Multi-task Learning For Disentangling Timbre And Pitch In Singing Voice Synthesis (2022)4.52
- Singing Voice Conversion With Accompaniment Using Self-supervised Representation-based Melody Features (2025)0.00
- A Neural Parametric Singing Synthesizer (2017)10.97
- Melody Extraction From Polyphonic Music By Deep Learning Approaches: A Review (2022)0.00