Parallel Synthesis For Autoregressive Speech Generation
2022 Β· Po-Chun Hsu, da-Rong Liu, Andy T. Liu, et al.
Abstract
Autoregressive neural vocoders have achieved outstanding performance in speech synthesis tasks such as text-to-speech and voice conversion. An autoregressive vocoder predicts a sample at some time step conditioned on those at previous time steps. Though it synthesizes natural human speech, the iterative generation inevitably makes the synthesis time proportional to the utterance length, leading to low efficiency. Many works were dedicated to generating the whole speech sequence in parallel and proposed GAN-based, flow-based, and score-based vocoders. This paper proposed a new thought for the autoregressive generation. Instead of iteratively predicting samples in a time sequence, the proposed model performs frequency-wise autoregressive generation (FAR) and bit-wise autoregressive generation (BAR) to synthesize speech. In FAR, a speech utterance is split into frequency subbands, and a subband is generated conditioned on the previously generated one. Similarly, in BAR, an 8-bit quantized
Authors
(none)
Tags
Stats
Related papers
- Analysis By Adversarial Synthesis -- A Novel Approach For Speech Vocoding (2019)3.58
- A Post Auto-regressive GAN Vocoder Focused On Spectrum Fracture (2022)0.00
- Very Low Complexity Speech Synthesis Using Framewise Autoregressive GAN (FARGAN) With Pitch Prediction (2024)6.34
- TFGAN: Time And Frequency Domain Based Generative Adversarial Network For High-fidelity Speech Synthesis (2020)0.00
- Hifi-gan: Generative Adversarial Networks For Efficient And High Fidelity Speech Synthesis (2020)0.00
- Generating Diverse And Natural Text-to-speech Samples Using A Quantized Fine-grained VAE And Auto-regressive Prosody Prior (2020)12.54
- Paraformer: Fast And Accurate Parallel Transformer For Non-autoregressive End-to-end Speech Recognition (2022)15.10
- Fast And High-quality Auto-regressive Speech Synthesis Via Speculative Decoding (2024)5.24