Fast Spectrogram Inversion Using Multi-head Convolutional Neural Networks
2018 Β· Sercan O. Arik, Heewoo Jun, Gregory Diamos
Abstract
We propose the multi-head convolutional neural network (MCNN) architecture for waveform synthesis from spectrograms. Nonlinear interpolation in MCNN is employed with transposed convolution layers in parallel heads. MCNN achieves more than an order of magnitude higher compute intensity than commonly-used iterative algorithms like Griffin-Lim, yielding efficient utilization for modern multi-core processors, and very fast (more than 300x real-time) waveform synthesis. For training of MCNN, we use a large-scale speech recognition dataset and losses defined on waveforms that are related to perceptual audio quality. We demonstrate that MCNN constitutes a very promising approach for high-quality speech synthesis, without any iterative algorithms or autoregression in computations.
Authors
(none)
Tags
Stats
Related papers
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80
- Complex Spectrogram Enhancement By Convolutional Neural Network With Multi-metrics Learning (2017)15.57
- Mathematical Vocoder Algorithm : Modified Spectral Inversion For Efficient Neural Speech Synthesis (2021)0.00
- Nnaudio: An On-the-fly GPU Audio To Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks (2019)13.70
- Monaural Speech Enhancement Using A Multi-branch Temporal Convolutional Network (2019)3.58
- Muslcat: Multi-scale Multi-level Convolutional Attention Transformer For Discriminative Music Modeling On Raw Waveforms (2021)0.00
- Audio Spectrogram Representations For Processing With Convolutional Neural Networks (2017)0.00
- PCNN: A Lightweight Parallel Conformer Neural Network For Efficient Monaural Speech Enhancement (2023)6.77