Audio Time-scale Modification With Temporal Compressing Networks
2022 Β· Ernie Chu, Ju-Ting Chen, Chia-Ping Chen
Abstract
We propose a novel approach for time-scale modification of audio signals. Unlike traditional methods that rely on the framing technique or the short-time Fourier transform to preserve the frequency during temporal stretching, our neural network model encodes the raw audio into a high-level latent representation, dubbed Neuralgram, where each vector represents 1024 audio sample points. Due to a sufficient compression ratio, we are able to apply arbitrary spatial interpolation of the Neuralgram to perform temporal stretching. Finally, a learned neural decoder synthesizes the time-scaled audio samples based on the stretched Neuralgram representation. Both the encoder and decoder are trained with latent regression losses and adversarial losses in order to obtain high-fidelity audio samples. Despite its simplicity, our method has comparable performance compared to the existing baselines and opens a new possibility in research into modern time-scale modification. Audio samples can be found a
Authors
(none)
Tags
Stats
Related papers
- Efficient Neural Networks For Real-time Modeling Of Analog Dynamic Range Compression (2021)0.00
- Time Domain Neural Audio Style Transfer (2017)0.00
- Stftcodec: High-fidelity Audio Compression Through Time-frequency Domain Representation (2025)2.26
- Neuralogram: A Deep Neural Network Based Representation For Audio Signals (2019)0.00
- Bandwidth Extension On Raw Audio Via Generative Adversarial Networks (2019)0.00
- Audio Super Resolution Using Neural Networks (2017)0.00
- An Investigation Of The Reconstruction Capacity Of Stacked Convolutional Autoencoders For Log-mel-spectrograms (2023)0.00
- Autoencoder Based Architecture For Fast & Real Time Audio Style Transfer (2018)0.00