Mafin: Enhancing Black-box Embeddings With Model Augmented Fine-tuning
2024 Β· Mingtian Zhang, Shawn Lan, Peter Hayes, et al.
Abstract
Retrieval Augmented Generation (RAG) has emerged as an effective solution for mitigating hallucinations in Large Language Models (LLMs). The retrieval stage in RAG typically involves a pre-trained embedding model, which converts queries and passages into vectors to capture their semantics. However, a standard pre-trained embedding model may exhibit sub-optimal performance when applied to specific domain knowledge, necessitating fine-tuning. This paper addresses scenarios where the embeddings are only available from a black-box model. We introduce Model augmented fine-tuning (Mafin) -- a novel approach for fine-tuning a black-box embedding model by augmenting it with a trainable embedding model. Our results demonstrate that Mafin significantly enhances the performance of the black-box embeddings by only requiring the training of a small augmented model. We validate the effectiveness of our method on both labeled and unlabeled datasets, illustrating its broad applicability and efficiency
Authors
(none)
Tags
Stats
Related papers
- REFINE On Scarce Data: Retrieval Enhancement Through Fine-tuning Via Model Fusion Of Embedding Models (2024)3.58
- LMAR: Language Model Augmented Retriever For Domain-specific Knowledge Indexing (2025)1.57
- Re-ranking The Context For Multimodal Retrieval Augmented Generation (2025)0.00
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- MLLM Is A Strong Reranker: Advancing Multimodal Retrieval-augmented Generation Via Knowledge-enhanced Reranking And Noise-injected Training (2024)9.18
- Rethinking Hybrid Retrieval: When Small Embeddings And LLM Re-ranking Beat Bigger Models (2025)0.00
- Cafe: Unifying Representation And Generation With Contrastive-autoregressive Finetuning (2025)0.00
- Improving Embedding With Contrastive Fine-tuning On Small Datasets With Expert-augmented Scores (2024)0.00