Abstract
Industrial optical inspection lines generate massive streams of high-resolution images, yet many deployed algorithms still rely on hand-crafted features and small, carefully curated datasets, which limits robustness to changing products, lighting conditions, and defect types. A big-data-driven, optics-guided deep learning framework is proposed for automatic surface defect inspection in large-scale manufacturing. The framework integrates three key components: (1) a scalable optical image pipeline for data ingestion, quality assessment, and automatic sample selection; (2) an optics-guided multiscale feature extractor with defect-aware attention, designed to capture subtle reflectance and texture variations under industrial illumination; and (3) an online active-learning and incremental-training scheme that continuously adapts the model to distribution shifts on the production line. Raw images from multiple line-scan and area-scan cameras are first filtered by blur and exposure estimators, deduplicated using perceptual hashing, and balanced through class-aware sampling. A multibranch convolutional backbone with illumination normalization encodes local fine-grained defects and global shape context, while a defect-aware attention module leverages edge and contrast priors from the optical setup to emphasize potential defect regions. The network is trained in a multi-task manner for defect classification and localization with a composite loss that combines class-weighted cross-entropy, segmentation consistency, and edge-alignment regularization. During deployment, low-confidence predictions are routed to human review and periodically used to update the model via an incremental learning strategy with knowledge distillation. Experiments on a large-scale industrial dataset collected from multiple production lines demonstrate that the proposed framework achieves higher detection accuracy and better robustness to process variations than conventional convolutional neural network (CNN) and one-stage detection baselines, while satisfying real-time constraints on standard graphics processing unit (GPU) hardware.