Synthio: Augmenting Small-scale Audio Classification Datasets With Synthetic Data
2024 Β· Sreyan Ghosh, Sonal Kumar, Zhifeng Kong, et al.
Abstract
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data. Our goal is to improve audio classification accuracy with limited labeled data. Traditional data augmentation techniques, which apply artificial transformations (e.g., adding random noise or masking segments), struggle to create data that captures the true diversity present in real-world audios. To address this shortcoming, we propose to augment the dataset with synthetic audio generated from text-to-audio (T2A) diffusion models. However, synthesizing effective augmentations is challenging because not only should the generated data be acoustically consistent with the underlying small-scale dataset, but they should also have sufficient compositional diversity. To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization. This ensures that the acoustic characteristics of the generated data remain consis
Authors
(none)
Tags
Stats
Related papers
- Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition And Speech Modeling? (2024)7.15
- Generating Synthetic Audio Data For Attention-based Speech Recognition Systems (2019)12.68
- Synth2aug: Cross-domain Speaker Recognition With TTS Synthesized Speech (2020)6.77
- Tts-by-tts 2: Data-selective Augmentation For Neural Speech Synthesis Using Ranking Support Vector Machine With Variational Autoencoder (2022)4.52
- Low-resource Expressive Text-to-speech Using Data Augmentation (2020)11.29
- Tts-by-tts: Tts-driven Data Augmentation For Fast And High-quality Speech Synthesis (2020)9.59
- Cosyaudio: Improving Audio Generation With Confidence Scores And Synthetic Captions (2025)0.00
- A Framework For Synthetic Audio Conversations Generation Using Large Language Models (2024)3.58