Self-supervised Disentanglement Of Harmonic And Rhythmic Features In Music Audio Signals
2023 Β· Yiming Wu
Abstract
The aim of latent variable disentanglement is to infer the multiple informative latent representations that lie behind a data generation process and is a key factor in controllable data generation. In this paper, we propose a deep neural network-based self-supervised learning method to infer the disentangled rhythmic and harmonic representations behind music audio generation. We train a variational autoencoder that generates an audio mel-spectrogram from two latent features representing the rhythmic and harmonic content. In the training phase, the variational autoencoder is trained to reconstruct the input mel-spectrogram given its pitch-shifted version. At each forward computation in the training phase, a vector rotation operation is applied to one of the latent features, assuming that the dimensions of the feature vectors are related to pitch intervals. Therefore, in the trained variational autoencoder, the rotated latent feature represents the pitch-related information of the mel-sp
Authors
(none)
Tags
Stats
Related papers
- Towards Robust Unsupervised Disentanglement Of Sequential Data -- A Case Study Using Music Audio (2022)0.00
- Disentangling Speech And Non-speech Components For Building Robust Acoustic Models From Found Data (2019)0.00
- Rethinking Recurrent Latent Variable Model For Music Composition (2018)7.50
- Evaluation Of Latent Space Disentanglement In The Presence Of Interdependent Attributes (2021)0.00
- Semi-supervised Neural Chord Estimation Based On A Variational Autoencoder With Latent Chord Labels And Features (2020)7.16
- Interpretable Timbre Synthesis Using Variational Autoencoders Regularized On Timbre Descriptors (2023)0.00
- Adversarial Multi-task Learning For Disentangling Timbre And Pitch In Singing Voice Synthesis (2022)4.52
- Learning Style-aware Symbolic Music Representations By Adversarial Autoencoders (2020)2.26