Learning Complex Basis Functions For Invariant Representations Of Audio
2019 · Stefan Lattner, Monika Dörfler, Andreas Arzt
Abstract
Learning features from data has shown to be more successful than using hand-crafted features for many machine learning tasks. In music information retrieval (MIR), features learned from windowed spectrograms are highly variant to transformations like transposition or time-shift. Such variances are undesirable when they are irrelevant for the respective MIR task. We propose an architecture called Complex Autoencoder (CAE) which learns features invariant to orthogonal transformations. Mapping signals onto complex basis functions learned by the CAE results in a transformation-invariant "magnitude space" and a transformation-variant "phase space". The phase space is useful to infer transformations between data pairs. When exploiting the invariance-property of the magnitude space, we achieve state-of-the-art results in audio-to-score alignment and repeated section discovery for audio. A PyTorch implementation of the CAE, including the repeated section discovery method, is available online.
Authors
(none)
Tags
Stats
Related papers
- Learning Linearity In Audio Consistency Autoencoders Via Implicit Regularization (2025)0.00
- Audio-to-score Alignment Using Transposition-invariant Features (2018)0.00
- Audioformer: Audio Transformer Learns Audio Feature Representations From Discrete Acoustic Codes (2023)0.00
- Complexdec: A Domain-robust High-fidelity Neural Audio Codec With Complex Spectrum Modeling (2025)3.58
- Music2latent2: Audio Compression With Summary Embeddings And Autoregressive Decoding (2025)2.26
- Learning Style-aware Symbolic Music Representations By Adversarial Autoencoders (2020)2.26
- Similarity Measures For Vocal-based Drum Sample Retrieval Using Deep Convolutional Auto-encoders (2018)2.26
- A-JEPA: Joint-embedding Predictive Architecture Can Listen (2023)0.00