Resource-efficient Reference-free Evaluation Of Audio Captions
2024 Β· Rehana Mahfuz, Yinyi Guo, Erik Visser
Abstract
To establish the trustworthiness of systems that automatically generate text captions for audio, images and video, existing reference-free metrics rely on large pretrained models which are impractical to accommodate in resource-constrained settings. To address this, we propose some metrics to elicit the model's confidence in its own generation. To assess how well these metrics replace correctness measures that leverage reference captions, we test their calibration with correctness measures. We discuss why some of these confidence metrics align better with certain correctness measures. Further, we provide insight into why temperature scaling of confidence metrics is effective. Our main contribution is a suite of well-calibrated lightweight confidence metrics for reference-free evaluation of captions in resource-constrained settings.
Authors
(none)
Tags
Stats
Related papers
- Semantic-aware Confidence Calibration For Automated Audio Captioning (2025)0.00
- Can Audio Captions Be Evaluated With Image Caption Metrics? (2021)13.54
- Investigations In Audio Captioning: Addressing Vocabulary Imbalance And Evaluating Suitability Of Language-centric Performance Metrics (2022)0.00
- CLAIR-A: Leveraging Large Language Models To Judge Audio Captions (2024)2.00
- A Reference-less Quality Metric For Automatic Speech Recognition Via Contrastive-learning Of A Multi-language Model With Self-supervision (2023)2.51
- Cosyaudio: Improving Audio Generation With Confidence Scores And Synthetic Captions (2025)0.00
- Text-to-audio Grounding Based Novel Metric For Evaluating Audio Caption Similarity (2022)0.00
- Parameter Efficient Audio Captioning With Faithful Guidance Using Audio-text Shared Latent Representation (2023)3.58