From Sublinear to Linear: Local Convergence in Finite-Width Networks via Locally Polyak-Lojasiewicz Regions

Abstract

arXiv:2507.21429v3 Announce Type: replace-cross Abstract: We study local linear convergence of gradient descent for finite-width feedforward networks under the squared empirical loss. Prior work shows that GD can remain confined to a Locally Quasi-Convex Region (LQCR) around initialization, but only gives a sublinear rate. We show that if the empirical Neural Tangent Kernel is positive at initialization, Lipschitz stable on the LQCR, and compatible with the LQCR radius, then the squared loss satisfies a local Polyak-{\L}ojasiewicz inequality with constant $\mu = \lambda_0 - L_\Theta r(\Rcal) > 0$. Combined with fixed-step iterate containment in the LQCR, imposed as a hypothesis in the linear-rate theorem, this yields linear convergence on the region. The LQCR supplies localization; fixed-step containment is imposed as a hypothesis in the linear-rate theorem; and the PL inequality comes from NTK conditioning under squared loss. The result is therefore a sufficient local condition, not a claim that this mechanism is necessary or unique for fast convergence. Empirically, we probe the theory through NTK spectral gap, parameter drift, empirical PL ratio, and suboptimality decay. On binary MNIST, the NTK remains positive, the PL ratio has a positive lower envelope, and the loss shows geometric decay on the stable regime. In a width ablation, the fixed-step width-$1024$ run leaves the local regime; reducing the step size lowers final drift from $1.870$ to $0.158$, restores the observed local-regime diagnostics, and yields the largest empirical PL-ratio lower envelope observed in the study. A CNN robustness check on a CIFAR-10 subset shows the PL-ratio envelope remains positive across three seeds, with a positive lower envelope across all three seeds on the stable regime.

Abstract

Related papers