Audiosetmix: Enhancing Audio-language Datasets With Llm-assisted Augmentations
2024 Β· David Xu
Abstract
Multi-modal learning in the audio-language domain has seen significant advancements in recent years. However, audio-language learning faces challenges due to limited and lower-quality data compared to image-language tasks. Existing audio-language datasets are notably smaller, and manual labeling is hindered by the need to listen to entire audio clips for accurate labeling. Our method systematically generates audio-caption pairs by augmenting audio clips with natural language labels and corresponding audio signal processing operations. Leveraging a Large Language Model, we generate descriptions of augmented audio clips with a prompt template. This scalable method produces AudioSetMix, a high-quality training dataset for text-and-audio related models. Integration of our dataset improves models performance on benchmarks by providing diversified and better-aligned examples. Notably, our dataset addresses the absence of modifiers (adjectives and adverbs) in existing datasets. By enablin
Authors
(none)
Tags
Stats
Related papers
- Audiosetcaps: An Enriched Audio-caption Dataset Using Automated Generation Pipeline With Large Audio And Language Models (2024)13.44
- Auto-acd: A Large-scale Dataset For Audio-language Representation Learning (2023)10.74
- Improving Audio Captioning Models With Fine-grained Audio Features, Text Embedding Supervision, And LLM Mix-up Augmentation (2023)8.82
- Exploring Train And Test-time Augmentations For Audio-language Learning (2022)0.00
- Performance Improvement Of Language-queried Audio Source Separation Based On Caption Augmentation From Large Language Models For DCASE Challenge 2024 Task 9 (2024)0.00
- Sound-vecaps: Improving Audio Generation With Visual Enhanced Captions (2024)7.16
- Enhancing Automated Audio Captioning Via Large Language Models With Optimized Audio Encoding (2024)5.24
- Mixspeech: Cross-modality Self-learning With Audio-visual Stream Mixup For Visual Speech Translation And Recognition (2023)8.60