Abstract
Edit distance similarity search, also called approximate pattern matching, is a fundamental problem with widespread database applications. The goal of the problem is to preprocess \(n\) strings of length \(d\), to quickly answer queries \(q\) of the form: if there is a database string within edit distance \(r\) of \(q\), return a database string within edit distance \(cr\) of \(q\). Previous approaches to this problem either rely on very large (superconstant) approximation ratios \(c\), or very small search radii \(r\). Outside of a narrow parameter range, these solutions are not competitive with trivially searching through all \(n\) strings. In this work give a simple and easy-to-implement hash function that can quickly answer queries for a wide range of parameters. Specifically, our strategy can answer queries in time \(\tilde\{O\}(d3^rn^\{1/c\})\). The best known practical results require \(c \gg r\) to achieve any correctness guarantee; meanwhile, the best known theoretical resul