FAIR-Pruner: Leveraging Tolerance of Difference for Flexible Automatic Layer-Wise Neural Network Pruning

Abstract

Neural network pruning has been widely adopted to reduce the parameter scale of complex neural networks, enabling efficient deployment on resource-limited edge devices. Mainstream pruning methods typically adopt uniform pruning strategies, which tend to cause a substantial performance degradation under high sparsity levels. Recent studies focus on non-uniform layer-wise pruning, but such approaches typically depend on global architecture optimization, which is computational expensive and lacks flexibility. To address these limitations, this paper proposes a novel method named Flexible Automatic Identification and Removal (FAIR)-Pruner, which adaptively determines the sparsity levels of each layer and identifies the units to be pruned. The core of FAIR-Pruner lies in the introduction of a novel indicator, Tolerance of Differences (ToD), designed to balance the importance scores obtained from two complementary perspectives: the architecture-level (Utilization Score) and the task-level (Reconstruction Score). By controlling ToD at preset levels, FAIR-Pruner determines layer-specific thresholds and removes units whose Utilization Scores fall below the corresponding thresholds. Furthermore, by decoupling threshold determination from importance estimation, FAIR-Pruner allows users to flexibly obtain pruned models under varying pruning ratios. Extensive experiments demonstrate that FAIR-Pruner achieves state-of-the-art performance, maintaining higher accuracy even at high compression ratios. Moreover, the ToD based layer-wise pruning ratios can be directly applied to existing powerful importance measurements, thereby improving the performance under uniform-pruning.

Abstract

Related papers