Efficient Fine-tuning Methodology Of Text Embedding Models For Information Retrieval: Contrastive Learning Penalty (clp)
2024 Β· Jeongsu Yu
Abstract
Text embedding models play a crucial role in natural language processing, particularly in information retrieval, and their importance is further highlighted with the recent utilization of RAG (Retrieval- Augmented Generation). This study presents an efficient fine-tuning methodology encompassing data selection, loss function, and model architecture to enhance the information retrieval performance of pre-trained text embedding models. In particular, this study proposes a novel Contrastive Learning Penalty function that overcomes the limitations of existing Contrastive Learning. The proposed methodology achieves significant performance improvements over existing methods in document retrieval tasks. This study is expected to contribute to improving the performance of information retrieval systems through fine-tuning of text embedding models. The code for this study can be found at https://github.com/CreaLabs/Enhanced-BGE-M3-with-CLP-and-MoE, and the best-performing model can be found at h
Authors
(none)
Tags
Stats
Code
Related papers
- Optimizing CLIP Models For Image Retrieval With Maintained Joint-embedding Alignment (2024)6.34
- Improving Embedding With Contrastive Fine-tuning On Small Datasets With Expert-augmented Scores (2024)0.00
- Finetuning CLIP To Reason About Pairwise Differences (2024)0.00
- Jina CLIP: Your CLIP Model Is Also Your Text Retriever (2024)0.00
- Enhancing Image Retrieval : A Comprehensive Study On Photo Search Using The CLIP Mode (2024)0.00
- Refining Joint Text And Source Code Embeddings For Retrieval Task With Parameter-efficient Fine-tuning (2024)0.00
- Nv-retriever: Improving Text Embedding Models With Effective Hard-negative Mining (2024)0.00
- Rzenembed: Towards Comprehensive Multimodal Retrieval (2025)0.00