Time Domain Neural Audio Style Transfer
2017 Β· Parag K. Mital
Abstract
A recently published method for audio style transfer has shown how to extend the process of image style transfer to audio. This method synthesizes audio "content" and "style" independently using the magnitudes of a short time Fourier transform, shallow convolutional networks with randomly initialized filters, and iterative phase reconstruction with Griffin-Lim. In this work, we explore whether it is possible to directly optimize a time domain audio signal, removing the process of phase reconstruction and opening up possibilities for real-time applications and higher quality syntheses. We explore a variety of style transfer processes on neural networks that operate directly on time domain audio signals and demonstrate one such network capable of audio stylization.
Authors
(none)
Tags
Stats
Related papers
- Autoencoder Based Architecture For Fast & Real Time Audio Style Transfer (2018)0.00
- Play As You Like: Timbre-enhanced Multi-modal Music Style Transfer (2018)9.92
- Differentiable WORLD Synthesizer-based Neural Vocoder With Application To End-to-end Audio Style Transfer (2022)0.00
- Stylus: Repurposing Stable Diffusion For Training-free Music Style Transfer On Mel-spectrograms (2024)0.00
- Audio Time-scale Modification With Temporal Compressing Networks (2022)0.00
- Fine-grained Style Modeling, Transfer And Prediction In Text-to-speech Synthesis Via Phone-level Content-style Disentanglement (2020)9.41
- Towards Evaluating The Robustness Of Automatic Speech Recognition Systems Via Audio Style Transfer (2024)4.52
- Timbre Transfer With Variational Auto Encoding And Cycle-consistent Adversarial Networks (2021)0.00