Mathematical Vocoder Algorithm : Modified Spectral Inversion For Efficient Neural Speech Synthesis
2021 Β· Hyun Gon Ryu, Jeong-Hoon Kim, Simon See
Abstract
In this work, we propose a new mathematical vocoder algorithm(modified spectral inversion) that generates a waveform from acoustic features without phase estimation. The main benefit of using our proposed method is that it excludes the training stage of the neural vocoder from the end-to-end speech synthesis model. Our implementation can synthesize high fidelity speech at approximately 20 Mhz on CPU and 59.6MHz on GPU. This is 909 and 2,702 times faster compared to real-time. Since the proposed methodology is not a data-driven method, it is applicable to unseen voices and multiple languages without any additional work. The proposed method is expected to adapt for researching on neural network models capable of synthesizing speech at the studio recording level.
Authors
(none)
Tags
Stats
Related papers
- Vocos: Closing The Gap Between Time-domain And Fourier-based Neural Vocoders For High-quality Audio Synthesis (2023)6.10
- Vocgan: A High-fidelity Real-time Vocoder With A Hierarchically-nested Adversarial Network (2020)12.54
- Puffin: Pitch-synchronous Neural Waveform Generation For Fullband Speech On Modest Devices (2022)3.58
- Ultra-lightweight Neural Differential DSP Vocoder For High Quality Speech Synthesis (2024)5.24
- A Neural Denoising Vocoder For Clean Waveform Generation From Noisy Mel-spectrogram Based On Amplitude And Phase Predictions (2024)0.00
- Bivocoder: A Bidirectional Neural Vocoder Integrating Feature Extraction And Waveform Generation (2024)5.84
- Towards Parametric Speech Synthesis Using Gaussian-markov Model Of Spectral Envelope And Wavelet-based Decomposition Of F0 (2022)0.00
- Univnet: A Neural Vocoder With Multi-resolution Spectrogram Discriminators For High-fidelity Waveform Generation (2021)14.80