Fast And Scalable Gene Embedding Search: A Comparative Study Of FAISS And Scann
2025 Β· Mohammad Saleh Refahi, Gavin Hearne, Harrison Muller, et al.
Abstract
The exponential growth of DNA sequencing data has outpaced traditional heuristic-based methods, which struggle to scale effectively. Efficient computational approaches are urgently needed to support large-scale similarity search, a foundational task in bioinformatics for detecting homology, functional similarity, and novelty among genomic and proteomic sequences. Although tools like BLAST have been widely used and remain effective in many scenarios, they suffer from limitations such as high computational cost and poor performance on divergent sequences. In this work, we explore embedding-based similarity search methods that learn latent representations capturing deeper structural and functional patterns beyond raw sequence alignment. We systematically evaluate two state-of-the-art vector search libraries, FAISS and ScaNN, on biologically meaningful gene embeddings. Unlike prior studies, our analysis focuses on bioinformatics-specific embeddings and benchmarks their utility for detect
Authors
(none)
Tags
Stats
Related papers
- Vector Embeddings By Sequence Similarity And Context For Improved Compression, Similarity Search, Clustering, Organization, And Manipulation Of Cdna Libraries (2023)2.26
- Distributed Representations For Biological Sequence Analysis (2016)0.00
- The Faiss Library (2024)18.51
- Learned Indexing In Proteins: Extended Work On Substituting Complex Distance Calculations With Embedding And Clustering Techniques (2022)5.84
- Let Them Have CAKES: A Cutting-edge Algorithm For Scalable, Efficient, And Exact Search On Big Data (2023)2.68
- Vectorsearch: Enhancing Document Retrieval With Semantic Embeddings And Optimized Search (2024)0.00
- Fpscreen: A Rapid Similarity Search Tool For Massive Molecular Library Based On Molecular Fingerprint Comparison (2019)0.00
- Utilizing Low-dimensional Molecular Embeddings For Rapid Chemical Similarity Search (2024)4.52