Broken Memories: Detecting and Mitigating Memorization in Diffusion Models with Degraded Generations

Yuanmin Huang·Mi Zhang·Chen Chen·Feifei Li·Geng Hong·Xiaoyu You·Min Yang·2026

arXiv:2605.22050 ↗Google Scholar ↗Semantic Scholar ↗

Abstract

arXiv:2605.22050v3 Announce Type: replace Abstract: While diffusion models excel at generating high-quality images, their tendency to memorize training data poses significant privacy and copyright risks. In this work, we for the first time identify that memorization induces internal numerical instability, often manifesting as visually ``broken'' artifacts. Inspired by stability analysis in numerical methods, we introduce empirical stability regions based on latent update norms to quantitatively characterize stable behavior during generation. Leveraging this, we propose a principled, on-the-fly framework for step-wise detection and adaptive mitigation. Our approach suppresses memorization without altering prompts or guidance, thereby preserving semantic fidelity and image quality. Extensive experiments on Stable Diffusion 1.4 demonstrate that our method achieves an AUC $>0.999$ detection performance and a $0.0\%$ memorization rate after mitigation with negligible overhead ($\approx0.01$s per image).

Abstract

Related papers