Towards Robust Unsupervised Disentanglement Of Sequential Data -- A Case Study Using Music Audio
2022 Β· Yin-Jyun Luo, Sebastian Ewert, Simon Dixon
Abstract
Disentangled sequential autoencoders (DSAEs) represent a class of probabilistic graphical models that describes an observed sequence with dynamic latent variables and a static latent variable. The former encode information at a frame rate identical to the observation, while the latter globally governs the entire sequence. This introduces an inductive bias and facilitates unsupervised disentanglement of the underlying local and global factors. In this paper, we show that the vanilla DSAE suffers from being sensitive to the choice of model architecture and capacity of the dynamic latent variables, and is prone to collapse the static latent variable. As a countermeasure, we propose TS-DSAE, a two-stage training framework that first learns sequence-level prior distributions, which are subsequently employed to regularise the model and facilitate auxiliary objectives to promote disentanglement. The proposed framework is fully unsupervised and robust against the global factor collapse problem
Authors
(none)
Tags
Stats
Related papers
- Disentangled Sequential Autoencoder (2018)0.00
- Self-supervised Disentanglement Of Harmonic And Rhythmic Features In Music Audio Signals (2023)0.00
- Disentangling Speech And Non-speech Components For Building Robust Acoustic Models From Found Data (2019)0.00
- Exploring Single-song Autoencoding Schemes For Audio-based Music Structure Analysis (2021)0.00
- Audio Word2vec: Unsupervised Learning Of Audio Segment Representations Using Sequence-to-sequence Autoencoder (2016)0.00
- Self-supervised Disentangled Representation Learning For Robust Target Speech Extraction (2023)5.24
- Towards The Next Frontier In Speech Representation Learning Using Disentanglement (2024)0.00
- High-fidelity Audio Generation And Representation Learning With Guided Adversarial Autoencoder (2020)8.35