Uniadapter: Unified Parameter-efficient Transfer Learning For Cross-modal Modeling
2023 Β· Haoyu Lu, Yuqi Huo, Guoxing Yang, et al.
Abstract
Large-scale vision-language pre-trained models have shown promising transferability to various downstream tasks. As the size of these foundation models and the number of downstream tasks grow, the standard full fine-tuning paradigm becomes unsustainable due to heavy computational and storage costs. This paper proposes UniAdapter, which unifies unimodal and multimodal adapters for parameter-efficient cross-modal adaptation on pre-trained vision-language models. Specifically, adapters are distributed to different modalities and their interactions, with the total number of tunable parameters reduced by partial weight sharing. The unified and knowledge-sharing design enables powerful cross-modal representations that can benefit various downstream tasks, requiring only 1.0%-2.0% tunable parameters of the pre-trained model. Extensive experiments on 6 cross-modal downstream benchmarks (including video-text retrieval, image-text retrieval, VideoQA, and VQA) show that in most cases, UniAdapter
Authors
(none)
Tags
Stats
Related papers
- Cross-modal Adapter: Parameter-efficient Transfer Learning Approach For Vision-language Models (2024)6.77
- Ucdr-adapter: Exploring Adaptation Of Pre-trained Vision-language Models For Universal Cross-domain Retrieval (2024)4.52
- Efficient And Versatile Robust Fine-tuning Of Zero-shot Models (2024)4.52
- Mv-adapter: Multimodal Video Transfer Learning For Video Text Retrieval (2023)9.76
- Multiway-adapater: Adapting Large-scale Multi-modal Models For Scalable Image-text Retrieval (2023)0.00
- Parameter-efficient Sparse Retrievers And Rerankers Using Adapters (2023)4.52
- Understanding Retrieval-augmented Task Adaptation For Vision-language Models (2024)0.00
- Queryadapter: Rapid Adaptation Of Vision-language Models In Response To Natural Language Queries (2025)0.00