An Investigation Of The Reconstruction Capacity Of Stacked Convolutional Autoencoders For Log-mel-spectrograms
2023 Β· Anastasia Natsiou, Luca Longo, Sean O'Leary
Abstract
In audio processing applications, the generation of expressive sounds based on high-level representations demonstrates a high demand. These representations can be used to manipulate the timbre and influence the synthesis of creative instrumental notes. Modern algorithms, such as neural networks, have inspired the development of expressive synthesizers based on musical instrument timbre compression. Unsupervised deep learning methods can achieve audio compression by training the network to learn a mapping from waveforms or spectrograms to low-dimensional representations. This study investigates the use of stacked convolutional autoencoders for the compression of time-frequency audio representations for a variety of instruments for a single pitch. Further exploration of hyper-parameters and regularization techniques is demonstrated to enhance the performance of the initial design. In an unsupervised manner, the network is able to reconstruct a monophonic and harmonic sound based on laten
Authors
(none)
Tags
Stats
Related papers
- Exploring Single-song Autoencoding Schemes For Audio-based Music Structure Analysis (2021)0.00
- Music2latent2: Audio Compression With Summary Embeddings And Autoregressive Decoding (2025)2.26
- Audio Time-scale Modification With Temporal Compressing Networks (2022)0.00
- Conditioning Autoencoder Latent Spaces For Real-time Timbre Interpolation And Synthesis (2020)3.58
- Audio Spectrogram Representations For Processing With Convolutional Neural Networks (2017)0.00
- Utilizing Domain Knowledge In End-to-end Audio Processing (2017)0.00
- Harp-net: Hyper-autoencoded Reconstruction Propagation For Scalable Neural Audio Coding (2021)9.41
- Neuralogram: A Deep Neural Network Based Representation For Audio Signals (2019)0.00