Abstract

We study the problem of \(\textit\{vector set search\}\) with \(\textit\{vector set queries\}\). This task is analogous to traditional near-neighbor search, with the exception that both the query and each element in the collection are \(\textit\{sets\}\) of vectors. We identify this problem as a core subroutine for semantic search applications and find that existing solutions are unacceptably slow. Towards this end, we present a new approximate search algorithm, DESSERT (\(\{\bf D\}\)ESSERT \(\{\bf E\}\)ffeciently \(\{\bf S\}\)earches \(\{\bf S\}\)ets of \(\{\bf E\}\)mbeddings via \(\{\bf R\}\)etrieval \(\{\bf T\}\)ables). DESSERT is a general tool with strong theoretical guarantees and excellent empirical performance. When we integrate DESSERT into ColBERT, a state-of-the-art semantic search model, we find a 2-5x speedup on the MS MARCO and LoTTE retrieval benchmarks with minimal loss in recall, underscoring the effectiveness and practical applicability of our proposal.

Authors

(none)

Tags

  • ANN Search

Stats

  • citations1
  • S2 citationsβ€”
  • github stars0
  • HF likes0
  • heat score2.26
  • arxiv keyengels2022dessert

Related papers