Recovering Communities in Structured Random Graphs

Abstract

The problem of recovering planted community structure in random graphs has received a lot of attention in the literature on the stochastic block model, where the input is a random graph in which edges crossing between different communities appear with smaller probability than edges induced by communities. The communities themselves form a collection of vertex-disjoint sparse cuts in the expected graph, and can be recovered, often exactly, from a sample as long as a separation condition on the intra- and inter-community edge probabilities is satisfied. In this paper, we ask whether the presence of a large number of overlapping sparsest cuts in the expected graph still allows recovery. For example, the $d$-dimensional hypercube graph admits $d$ distinct (balanced) sparsest cuts, one for every coordinate. Can these cuts be identified given a random sample of the edges of the hypercube where each edge is present independently with some probability $p\in (0, 1)$? We show that this is the case, in a very strong sense: the sparsest balanced cut in a sample of the hypercube at rate $p=C\log d/d$ for a sufficiently large constant $C$ is $1/\text{poly}(d)$-close to a coordinate cut with high probability. This is asymptotically optimal and allows approximate recovery of all $d$ cuts simultaneously. Furthermore, for an appropriate sample of hypercube-like graphs recovery can be made exact. The proof is essentially a strong hypercube cut sparsification bound that combines a theorem of Friedgut, Kalai and Naor on boolean functions whose Fourier transform concentrates on the first level of the Fourier spectrum with Karger's cut counting argument.

Abstract

Related papers