Abstract
Defect segmentation is central to computer vision based inspection of infrastructure assets during both construction and operation. However, deployment remains limited due to scarce pixel-level labels and domain shift across environments. We introduce CrackSegFlow, a controllable Flow Matching synthesis method that renders synthetic images of cracks from masks with pixel-level alignment. Our renderer combines topology-preserving mask injection with edge gating to maintain thin-structure continuity. Class-conditional FM samples masks for topology diversity, and CrackSegFlow renders aligned ground truth images from them. We further inject cracks onto crack-free backgrounds to diversify confounders and reduce false positives. Across five datasets and using a CNN-Transformer backbone, our results demonstrate that adding synthesized pairs improves in-domain performance by +5.37 mIoU and +5.13 F1, while target-guided cross-domain synthesis driven by target mask statistics adds +13.12 mIoU and +14.82 F1. We also release CSF-50K, a benchmark dataset comprising 50,000 image-mask pairs.