Exploring Distributed Vector Databases Performance On HPC Platforms: A Study With Qdrant
2025 Β· Seth Ockerman, Amal Gueroudji, Song Young Oh, et al.
Abstract
Vector databases have rapidly grown in popularity, enabling efficient similarity search over data such as text, images, and video. They now play a central role in modern AI workflows, aiding large language models by grounding model outputs in external literature through retrieval-augmented generation. Despite their importance, little is known about the performance characteristics of vector databases in high-performance computing (HPC) systems that drive large-scale science. This work presents an empirical study of distributed vector database performance on the Polaris supercomputer in the Argonne Leadership Computing Facility. We construct a realistic biological-text workload from BV-BRC and generate embeddings from the peS2o corpus using Qwen3-Embedding-4B. We select Qdrant to evaluate insertion, index construction, and query latency with up to 32 workers. Informed by practical lessons from our experience, this work takes a first step toward characterizing vector database performance
Authors
(none)
Tags
Stats
Related papers
- Passing The Baton: High Throughput Distributed Disk-based Vector Search With Batann (2025)0.00
- Exqutor: Extended Query Optimizer For Vector-augmented Analytical Queries (2025)0.00
- Vector Database Management Systems: Fundamental Concepts, Use-cases, And Current Challenges (2023)14.23
- Reveal Hidden Pitfalls And Navigate Next Generation Of Vector Similarity Search From Task-centric Views (2025)0.00
- Cracking Vector Search Indexes (2025)3.58
- Frequency-aware Graph Construction And Search For Dynamic Vector Databases (2025)0.00
- Starling: An I/o-efficient Disk-resident Graph Index Framework For High-dimensional Vector Similarity Search On Data Segment (2024)12.61
- SQUASH: Serverless And Distributed Quantization-based Attributed Vector Similarity Search (2025)0.00