Fully Dynamic Euclidean k-Means

Abstract

We consider the Euclidean $k$-means clustering problem in a dynamic setting, where we have to explicitly maintain a solution (a set of $k$ centers) $S \subseteq \mathbb{R}^d$ subject to point insertions/deletions in $\mathbb{R}^d$. We present a dynamic algorithm for Euclidean $k$-means with $\mathrm{poly}(1/\epsilon)$-approximation ratio, $\tilde{O}(k^{\epsilon})$ update time, and $\tilde{O}(1)$ recourse, for any $\epsilon \in (0,1)$, even when $d$ and $k$ are both part of the input. This is the first algorithm to achieve a constant ratio with $o(k)$ update time for this problem, whereas the previous $O(1)$-approximation runs in $\tilde O(k)$ update time [Bhattacharya, Costa, Farokhnejad; STOC'25]. In fact, previous algorithms cannot go beyond $O(k)$ update time precisely because they are designed for general metrics where an $\Omega(k)$ lower bound is known. We break this $O(k)$ barrier by devising new fundamental data structures to utilize Euclidean properties: a structure that (implicitly) maintains a clustering subject to both center and data point updates, and a range query structure that can evaluate a mergeable function over any metric ball range given as a query. To obtain these structures, we devise the first consistent hashing scheme [Czumaj, Jiang, Krauthgamer, Vesel{\'{y}}, Yang; FOCS'22] that achieves $\tilde O(n^{\epsilon})$ running time per point evaluation with competitive parameters. Our final algorithm exploits the framework of [Bhattacharya, Costa, Farokhnejad; STOC'25] for general metrics. The key change is to redesign several critical subroutines so that they reduce to our new Euclidean data structures, replacing the general-metric implementations that are unlikely to run efficiently even when Euclidean properties are provided.

Abstract

Related papers