Tonet: Tone-octave Network For Singing Melody Extraction From Polyphonic Music
2022 Β· Ke Chen, Shuai Yu, Cheng-I Wang, et al.
Abstract
Singing melody extraction is an important problem in the field of music information retrieval. Existing methods typically rely on frequency-domain representations to estimate the sung frequencies. However, this design does not lead to human-level performance in the perception of melody information for both tone (pitch-class) and octave. In this paper, we propose TONet, a plug-and-play model that improves both tone and octave perceptions by leveraging a novel input representation and a novel network architecture. First, we present an improved input representation, the Tone-CFP, that explicitly groups harmonics via a rearrangement of frequency-bins. Second, we introduce an encoder-decoder architecture that is designed to obtain a salience feature map, a tone feature map, and an octave feature map. Third, we propose a tone-octave fusion mechanism to improve the final salience feature map. Experiments are done to verify the capability of TONet with various baseline backbone models. Our res
Authors
(none)
Tags
Stats
Related papers
- Towards Improving Harmonic Sensitivity And Prediction Stability For Singing Melody Extraction (2023)0.00
- A Streamlined Encoder/decoder Architecture For Melody Extraction (2018)12.68
- Vocal Melody Extraction Using Patch-based CNN (2018)12.47
- Melody Extraction From Polyphonic Music By Deep Learning Approaches: A Review (2022)0.00
- Mbtfnet: Multi-band Temporal-frequency Neural Network For Singing Voice Enhancement (2023)3.58
- STONE: Self-supervised Tonality Estimator (2024)0.00
- Mad Twinnet: Masker-denoiser Architecture With Twin Networks For Monaural Sound Source Separation (2018)0.00
- Multiple F0 Estimation In Vocal Ensembles Using Convolutional Neural Networks (2020)0.00