Addressing Index Collapse Of Large-codebook Speech Tokenizer With Dual-decoding Product-quantized Variational Auto-encoder
2024 · Haohan Guo, Fenglong Xie, Dongchao Yang, et al.
Abstract
VQ-VAE, as a mainstream approach of speech tokenizer, has been troubled by ``index collapse'', where only a small number of codewords are activated in large codebooks. This work proposes product-quantized (PQ) VAE with more codebooks but fewer codewords to address this problem and build large-codebook speech tokenizers. It encodes speech features into multiple VQ subspaces and composes them into codewords in a larger codebook. Besides, to utilize each VQ subspace well, we also enhance PQ-VAE via a dual-decoding training strategy with the encoding and quantized sequences. The experimental results demonstrate that PQ-VAE addresses ``index collapse" effectively, especially for larger codebooks. The model with the proposed training strategy further improves codebook perplexity and reconstruction quality, outperforming other multi-codebook VQ approaches. Finally, PQ-VAE demonstrates its effectiveness in language-model-based TTS, supporting higher-quality speech generation with larger codebo
Authors
(none)
Tags
Stats
Related papers
- Robust Training Of Vector Quantized Bottleneck Models (2020)11.29
- Single-codec: Single-codebook Speech Codec Towards High-performance Speech Generation (2024)9.23
- Improved Prosody From Learned F0 Codebook Representations For VQ-VAE Speech Waveform Reconstruction (2020)7.50
- VQVAE Unsupervised Unit Discovery And Multi-scale Code2spec Inverter For Zerospeech Challenge 2019 (2019)0.00
- ERVQ: Enhanced Residual Vector Quantization With Intra-and-inter-codebook Optimization For Neural Audio Codecs (2024)6.34
- Enhancing Into The Codec: Noise Robust Speech Coding With Vector-quantized Autoencoders (2021)10.21
- Delightfultts 2: End-to-end Speech Synthesis With Adversarial Vector-quantized Auto-encoders (2022)9.23
- Low Bit-rate Speech Coding With VQ-VAE And A Wavenet Decoder (2019)14.80