Soft Prompt Tuning For Augmenting Dense Retrieval With Large Language Models
2023 Β· Zhiyuan Peng, Xuyang Wu, Qifan Wang, et al.
Abstract
Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarity between queries and documents in vector space. One of the challenges in DR is the lack of domain-specific training data. While DR models can learn from large-scale public datasets like MS MARCO through transfer learning, evidence shows that not all DR models and domains can benefit from transfer learning equally. Recently, some researchers have resorted to large language models (LLMs) to improve the zero-shot and few-shot DR models. However, the hard prompts or human-written prompts utilized in these works cannot guarantee the good quality of generated weak queries. To tackle this, we propose soft prompt tuning for augmenting DR (SPTAR): For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data and then prompt the LLMs to tag unlabeled documents with weak queries, yielding enough weak document-query pairs to train task-specific d
Authors
(none)
Tags
Stats
Related papers
- Promptreps: Prompting Large Language Models To Generate Dense And Sparse Representations For Zero-shot Document Retrieval (2024)10.61
- Soft Prompt Decoding For Multilingual Dense Retrieval (2023)7.50
- Fine-grained Retrieval Prompt Tuning (2022)10.07
- Pseudo Relevance Feedback Is Enough To Close The Gap Between Small And Large Dense Retrieval Models (2025)0.00
- How To Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval (2023)11.39
- Expandr: Teaching Dense Retrievers Beyond Queries With LLM Guidance (2025)3.25
- Don't Retrieve, Generate: Prompting Llms For Synthetic Training Data In Dense Retrieval (2025)0.00
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00