Nnaudio: An On-the-fly GPU Audio To Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks
2019 Β· Kin Wai Cheuk, Hans Anderson, Kat Agres, et al.
Abstract
Converting time domain waveforms to frequency domain spectrograms is typically considered to be a prepossessing step done before model training. This approach, however, has several drawbacks. First, it takes a lot of hard disk space to store different frequency domain representations. This is especially true during the model development and tuning process, when exploring various types of spectrograms for optimal performance. Second, if another dataset is used, one must process all the audio clips again before the network can be retrained. In this paper, we integrate the time domain to frequency domain conversion as part of the model structure, and propose a neural network based toolbox, nnAudio, which leverages 1D convolutional neural networks to perform time domain to frequency domain conversion during feed-forward. It allows on-the-fly spectrogram generation without the need to store any spectrograms on the disk. This approach also allows back-propagation on the waveforms-to-spectrog
Authors
(none)
Tags
Stats
Related papers
- Audio Spectrogram Representations For Processing With Convolutional Neural Networks (2017)0.00
- Audio Classification Of Low Feature Spectrograms Utilizing Convolutional Neural Networks (2024)5.84
- Time Domain Neural Audio Style Transfer (2017)0.00
- Fast Spectrogram Inversion Using Multi-head Convolutional Neural Networks (2018)14.39
- Audio Time-scale Modification With Temporal Compressing Networks (2022)0.00
- Dynamic Convolutional Neural Networks As Efficient Pre-trained Audio Models (2023)0.00
- Utilizing Domain Knowledge In End-to-end Audio Processing (2017)0.00
- Adversarial Generation Of Time-frequency Features With Application In Audio Synthesis (2019)0.00