Tts-by-tts 2: Data-selective Augmentation For Neural Speech Synthesis Using Ranking Support Vector Machine With Variational Autoencoder
2022 Β· Eunwoo Song, Ryuichi Yamamoto, Ohsung Kwon, et al.
Abstract
Recent advances in synthetic speech quality have enabled us to train text-to-speech (TTS) systems by using synthetic corpora. However, merely increasing the amount of synthetic data is not always advantageous for improving training efficiency. Our aim in this study is to selectively choose synthetic data that are beneficial to the training process. In the proposed method, we first adopt a variational autoencoder whose posterior distribution is utilized to extract latent features representing acoustic similarity between the recorded and synthetic corpora. By using those learned features, we then train a ranking support vector machine (RankSVM) that is well known for effectively ranking relative attributes among binary classes. By setting the recorded and synthetic ones as two opposite classes, RankSVM is used to determine how the synthesized speech is acoustically similar to the recorded data. Then, synthetic TTS data, whose distribution is close to the recorded data, are selected from
Authors
(none)
Tags
Stats
Related papers
- Tts-by-tts: Tts-driven Data Augmentation For Fast And High-quality Speech Synthesis (2020)9.59
- Conditional Variational Autoencoder With Adversarial Learning For End-to-end Text-to-speech (2021)0.00
- On The Problem Of Text-to-speech Model Selection For Synthetic Data Generation In Automatic Speech Recognition (2024)4.52
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68
- Low-resource Expressive Text-to-speech Using Data Augmentation (2020)11.29
- Improving Accented Speech Recognition Using Data Augmentation Based On Unsupervised Text-to-speech Synthesis (2024)0.00
- Text-to-speech Synthesis From Dark Data With Evaluation-in-the-loop Data Selection (2022)7.50
- You Do Not Need More Data: Improving End-to-end Speech Recognition By Text-to-speech Data Augmentation (2020)11.49