Fast Gumbel-max Sketch And Its Applications

Abstract

The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a non-negative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element \(i\) in proportion to its positive weight \(v_i\), the Gumbel-Max Trick first computes a Gumbel random variable \(g_i\) for each positive weight element \(i\), and then samples the element \(i\) with the largest value of \(g_i+\ln v_i\). Recently, applications including similarity estimation and weighted cardinality estimation require to generate \(k\) independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large \(k\) (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, which reduces the time complexity from \(O(kn^+)\) to \(O(k \ln k + n^+)\), where \(n^+\) is the number of positive elements

Abstract

Related papers