Utilizing Low-dimensional Molecular Embeddings For Rapid Chemical Similarity Search
2024 Β· Kathryn E. Kirchoff, James Wellnitz, Joshua E. Hochuli, et al.
Abstract
Nearest neighbor-based similarity searching is a common task in chemistry, with notable use cases in drug discovery. Yet, some of the most commonly used approaches for this task still leverage a brute-force approach. In practice this can be computationally costly and overly time-consuming, due in part to the sheer size of modern chemical databases. Previous computational advancements for this task have generally relied on improvements to hardware or dataset-specific tricks that lack generalizability. Approaches that leverage lower-complexity searching algorithms remain relatively underexplored. However, many of these algorithms are approximate solutions and/or struggle with typical high-dimensional chemical embeddings. Here we evaluate whether a combination of low-dimensional chemical embeddings and a k-d tree data structure can achieve fast nearest neighbor queries while maintaining performance on standard chemical similarity search benchmarks. We examine different dimensionality redu
Authors
(none)
Tags
Stats
Related papers
- Scalable Similarity Search For Molecular Descriptors (2016)0.00
- Learned Indexing In Proteins: Extended Work On Substituting Complex Distance Calculations With Embedding And Clustering Techniques (2022)5.84
- Embassi: Embedding Assignment Costs For Similarity Search In Large Graph Databases (2021)2.26
- Chembed: Enhancing Chemical Literature Search Through Domain-specific Text Embeddings (2025)0.00
- Improving Similarity Search With High-dimensional Locality-sensitive Hashing (2018)0.00
- Exploiting Pre-trained Models For Drug Target Affinity Prediction With Nearest Neighbors (2024)3.58
- Unconventional Application Of K-means For Distributed Approximate Similarity Search (2022)5.84
- Fpscreen: A Rapid Similarity Search Tool For Massive Molecular Library Based On Molecular Fingerprint Comparison (2019)0.00