Music2latent2: Audio Compression With Summary Embeddings And Autoregressive Decoding
2025 Β· Marco Pasini, Stefan Lattner, George Fazekas
Abstract
Efficiently compressing high-dimensional audio signals into a compact and informative latent space is crucial for various tasks, including generative modeling and music information retrieval (MIR). Existing audio autoencoders, however, often struggle to achieve high compression ratios while preserving audio fidelity and facilitating efficient downstream applications. We introduce Music2Latent2, a novel audio autoencoder that addresses these limitations by leveraging consistency models and a novel approach to representation learning based on unordered latent embeddings, which we call summary embeddings. Unlike conventional methods that encode local audio features into ordered sequences, Music2Latent2 compresses audio signals into sets of summary embeddings, where each embedding can capture distinct global features of the input sample. This enables to achieve higher reconstruction quality at the same compression ratio. To handle arbitrary audio lengths, Music2Latent2 employs an autoregre
Authors
(none)
Tags
Stats
Related papers
- Exploring Single-song Autoencoding Schemes For Audio-based Music Structure Analysis (2021)0.00
- Learning Style-aware Symbolic Music Representations By Adversarial Autoencoders (2020)2.26
- Audio Language Modeling Using Perceptually-guided Discrete Representations (2022)0.00
- Semanticodec: An Ultra Low Bitrate Semantic Audio Codec For General Sound (2024)10.97
- Learning Linearity In Audio Consistency Autoencoders Via Implicit Regularization (2025)0.00
- Modeling Strategies For Speech Enhancement In The Latent Space Of A Neural Audio Codec (2025)0.00
- An Investigation Of The Reconstruction Capacity Of Stacked Convolutional Autoencoders For Log-mel-spectrograms (2023)0.00
- Inspiremusic: Integrating Super Resolution And Large Language Model For High-fidelity Long-form Music Generation (2025)6.26