Integrating Continuous And Binary Relevances In Audio-text Relevance Learning
2024 · Huang Xie, Khazar Khorrami, Okko Räsänen, et al.
Abstract
Audio-text relevance learning refers to learning the shared semantic properties of audio samples and textual descriptions. The standard approach uses binary relevances derived from pairs of audio samples and their human-provided captions, categorizing each pair as either positive or negative. This may result in suboptimal systems due to varying levels of relevance between audio samples and captions. In contrast, a recent study used human-assigned relevance ratings, i.e., continuous relevances, for these pairs but did not obtain performance gains in audio-text relevance learning. This work introduces a relevance learning method that utilizes both human-assigned continuous relevance ratings and binary relevances using a combination of a listwise ranking objective and a contrastive learning objective. Experimental results demonstrate the effectiveness of the proposed method, showing improvements in language-based audio retrieval, a downstream task in audio-text relevance learning. In addi
Authors
(none)
Tags
Stats
Related papers
- Text-based Audio Retrieval By Learning From Similarities Between Audio Captions (2024)2.26
- Segment Relevance Estimation For Audio Analysis And Weakly-labelled Classification (2019)0.00
- Connecting The Dots Between Audio And Text Without Parallel Data Through Visual Knowledge Transfer (2021)8.09
- Contrastive Latent Space Reconstruction Learning For Audio-text Retrieval (2023)3.58
- Interpretable Representation Learning For Speech And Audio Signals Based On Relevance Weighting (2020)9.59
- Enhancing Retrieval-augmented Audio Captioning With Generation-assisted Multimodal Querying And Progressive Learning (2024)3.58
- On Negative Sampling For Contrastive Audio-text Retrieval (2022)0.00
- Introducing Auxiliary Text Query-modifier To Content-based Audio Retrieval (2022)0.00