Language-based Audio Retrieval With Converging Tied Layers And Contrastive Loss
2022 Β· Andrew Koh, Eng Siong Chng
Abstract
In this paper, we tackle the new Language-Based Audio Retrieval task proposed in DCASE 2022. Firstly, we introduce a simple, scalable architecture which ties both the audio and text encoder together. Secondly, we show that using this architecture along with contrastive loss allows the model to significantly beat the performance of the baseline model. Finally, in addition to having an extremely low training memory requirement, we are able to use pretrained models as it is without needing to finetune them. We test our methods and show that using a combination of our methods beats the baseline scores significantly.
Authors
(none)
Tags
Stats
Related papers
- Matching Text And Audio Embeddings: Exploring Transfer-learning Strategies For Language-based Audio Retrieval (2022)0.00
- Improving Natural-language-based Audio Retrieval With Transfer Learning And Audio & Text Augmentations (2022)0.00
- Contrastive Audio-language Learning For Music (2022)0.00
- Advancing Natural-language Based Audio Retrieval With Passt And Large Audio-caption Data Sets (2023)0.00
- Estimated Audio-caption Correspondences Improve Language-based Audio Retrieval (2024)0.00
- Sequential Contrastive Audio-visual Learning (2024)5.84
- Pretrained Conformers For Audio Fingerprinting And Retrieval (2025)0.00
- Unsupervised Dense Information Retrieval With Contrastive Learning (2021)0.00