Fpscreen: A Rapid Similarity Search Tool For Massive Molecular Library Based On Molecular Fingerprint Comparison
2019 Β· Lijun Wang, Jianbing Gong, Yingxia Zhang, et al.
Abstract
We designed a fast similarity search engine for large molecular libraries: FPScreen. We downloaded 100 million molecules' structure files in PubChem with SDF extension, then applied a computational chemistry tool RDKit to convert each structure file into one line of text in MACCS format and stored them in a text file as our molecule library. The similarity search engine compares the similarity while traversing the 166-bit strings in the library file line by line. FPScreen can complete similarity search through 100 million entries in our molecule library within one hour. That is very fast as a biology computation tool. Additionally, we divided our library into several strides for parallel processing. FPScreen was developed in WEB mode.
Authors
(none)
Tags
Stats
Related papers
- Scalable Similarity Search For Molecular Descriptors (2016)0.00
- Utilizing Low-dimensional Molecular Embeddings For Rapid Chemical Similarity Search (2024)4.52
- Fast And Scalable Gene Embedding Search: A Comparative Study Of FAISS And Scann (2025)2.26
- FLASH: Randomized Algorithms Accelerated Over CPU-GPU For Ultra-high Dimensional Similarity Search (2017)9.23
- Learned Indexing In Proteins: Extended Work On Substituting Complex Distance Calculations With Embedding And Clustering Techniques (2022)5.84
- A Resource-frugal Probabilistic Dictionary And Applications In Bioinformatics (2017)9.41
- Thin Bridges For Drug Text Alignment: Lightweight Contrastive Learning For Target Specific Drug Retrieval (2025)0.00
- A Fast Text Similarity Measure For Large Document Collections Using Multi-reference Cosine And Genetic Algorithm (2018)4.52