Optimal Local Convergence Rates of Stochastic First-Order Methods under Local $α$-PL

Abstract

We study the local convergence rate of stochastic first-order methods under a local $\alpha$-Polyak-Lojasiewicz ($\alpha$-PL) condition in a neighborhood of a target connected component $\mathcal{M}$ of the local minimizer set. The parameter $\alpha \in [1,2]$ is the exponent of the gradient norm in the $\alpha$-PL inequality: $\alpha=2$ recovers the classical PL case, $\alpha=1$ corresponds to Holder-type error bounds, and intermediate values interpolate between these regimes. Our performance criterion is the number of oracle queries required to output $\hat{x}$ with $F(\hat{x})-l \le \varepsilon$, where $l := F(y)$ for any $y \in \mathcal{M}$. We work in a local regime where the algorithm is initialized near $\mathcal{M}$ and, with high probability, its iterates remain in that neighborhood. We establish a lower bound $\Omega(\varepsilon^{-2/\alpha})$ for all stochastic first-order methods in this regime, and we obtain a matching upper bound $\mathcal{O}(\varepsilon^{-2/\alpha})$ for $1 \le \alpha < 2$ via a SARAH-type variance-reduced method with time-varying batch sizes and step sizes. In the convex setting, assuming a local $\alpha$-PL condition on the $\varepsilon$-sublevel set, we further show a complexity lower bound $\widetilde{\Omega}(\varepsilon^{-2/\alpha})$ for reaching an $\varepsilon$-global optimum, matching the $\varepsilon$-dependence of known accelerated stochastic subgradient methods.

Abstract

Related papers