AudioCaps
Emerging11papers using it
2023first seen
AudioCaps is a dataset used to evaluate video-to-audio generation methods by providing audio clips paired with descriptive captions.
Papers using AudioCaps (11)
- FoleyGenEx: Unified Video-to-Audio Generation with Multi-Modal Control, Temporal Alignment, and Semantic Precisione5-omni: Explicit Cross-modal Alignment for Omni-modal EmbeddingsLAMB: LLM-based Audio Captioning with Modality Gap Bridging via Cauchy-Schwarz DivergenceAC/DC: LLM-based Audio Comprehension via Dialogue ContinuationTraining-free Multimodal Guidance For Video To Audio GenerationMitigating Audiovisual Mismatch In Visual-guide Audio CaptioningDiffGAP: A Lightweight Diffusion Module in Contrastive Space for
Bridging Cross-Model GapAudio-Visual LLM for Video UnderstandingZero-Shot Audio Captioning Using Soft and Hard PromptsMINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley
Audio Content Planning and GenerationEnhancing Audio-Language Models through Self-Supervised Post-Training
with Text-Audio Pairs