Evaluating Embedding Apis For Information Retrieval
2023 Β· Ehsan Kamalloo, Xinyu Zhang, Odunayo Ogundepo, et al.
Abstract
The ever-increasing size of language models curtails their widespread availability to the community, thereby galvanizing many companies into offering access to large language models through APIs. One particular type, suitable for dense retrieval, is a semantic embedding service that builds vector representations of input text. With a growing number of publicly available APIs, our goal in this paper is to analyze existing offerings in realistic retrieval scenarios, to assist practitioners and researchers in finding suitable services according to their needs. Specifically, we investigate the capabilities of existing semantic embedding APIs on domain generalization and multilingual retrieval. For this purpose, we evaluate these services on two standard benchmarks, BEIR and MIRACL. We find that re-ranking BM25 results using the APIs is a budget-friendly approach and is most effective in English, in contrast to the standard practice of employing them as first-stage retrievers. For non-Engli
Authors
(none)
Tags
Stats
Related papers
- Medeir: A Specialized Medical Embedding Model For Enhanced Information Retrieval (2025)0.00
- IRSC: A Zero-shot Evaluation Benchmark For Information Retrieval Through Semantic Comprehension In Retrieval-augmented Generation Scenarios (2024)2.86
- BEIR: A Heterogenous Benchmark For Zero-shot Evaluation Of Information Retrieval Models (2021)6.67
- Large Reasoning Embedding Models: Towards Next-generation Dense Retrieval Paradigm (2025)0.00
- Llm-augmented Retrieval: Enhancing Retrieval Models Through Language Models And Doc-level Embedding (2024)0.00
- Search-adaptor: Embedding Customization For Information Retrieval (2023)0.00
- Bridging Language And Items For Retrieval And Recommendation: Benchmarking Llms As Semantic Encoders (2024)0.00
- What Drives Cross-lingual Ranking? Retrieval Approaches With Multilingual Language Models (2025)0.00