When Hashes Met Wedges: A Distributed Algorithm For Finding High Similarity Vectors
2017 Β· Aneesh Sharma, C. Seshadhri, Ashish Goel
Abstract
Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold \(\tau\). In contrast to previous work where \(\tau\) is assumed to be quite close to 1, we focus on recommendation applications where \(\tau\) is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small \(\tau\). To the best of our knowledge, there is no practical solution for computing all u
Authors
(none)
Tags
Stats
Related papers
- Massively-parallel Similarity Join, Edge-isoperimetry, And Distance Correlations On The Hypercube (2016)2.26
- Do You Like What I Like? Similarity Estimation In Proximity-based Mobile Social Networks (2018)7.16
- Lsf-join: Locality Sensitive Filtering For Distributed All-pairs Set Similarity Under Skew (2020)6.34
- When Similarity Digest Meets Vector Management System: A Survey On Similarity Hash Function (2021)0.00
- Hetfs: A Method For Fast Similarity Search With Ad-hoc Meta-paths On Heterogeneous Information Networks (2025)4.52
- Meta-path Guided Embedding For Similarity Search In Large-scale Heterogeneous Information Networks (2016)0.00
- Fast Similarity Sketching (2017)9.41
- Cluster-wise Unsupervised Hashing For Cross-modal Similarity Search (2019)11.39