Unifying Low Dimensional Spectra in Deep Learning

Abstract

arXiv:2404.06106v3 Announce Type: replace Abstract: Low dimensional structures appear ubiquitously in the eigenspectra of deep learning matrices in classification networks trained in the overparameterized regime. While theoretical advances have aimed to explain this phenomenology, they typically succeed only in capturing subsets of the full behavior or rely on assumptions that cannot hold in practice. In this work, we provide an analytic explanation for the bulk plus outlier structure of several canonical deep learning matrices, including the Hessian, gradients, and weights. We achieve this using unconstrained feature models (UFMs), a now-common tool for studying the emergence of deep neural collapse (DNC). We show that DNC is the source of these low dimensional eigenspectra, in each case, the eigenvalues and eigenvectors can be constructed from feature means, the characterizing objects of DNC. This provides a unifying analytic explanation for a wide range of spectral phenomena in deep learning and goes beyond empirical characterizations, which typically focus on eigenvalues, by providing a detailed analysis of eigenvectors. We prove that our results hold for both linear and ReLU networks and provide numerical validation in both the modeling context and standard deep-network architectures on canonical datasets.

Abstract

Related papers