FID on ImageNet 256x256 (class-conditional) fid-imagenet-256 Leaderboard
Class-conditional image generation quality β Frechet Inception Distance (FID) on ImageNet at 256x256, the canonical benchmark for image generators. Lower is better. Values use classifier-free guidance, traced to each method's paper. Β· Metric: FID (lower is better)
| # | Model | FID | Paper |
|---|---|---|---|
| 1 | REPA (SiT-XL/2 + REPA) | 1.42 | link |
| 2 | TexTok (DiT + text-conditioned tokenizer) | 1.46 | β |
| 3 | MAR-H (Diffusion Loss) | 1.55 | link |
| 4 | VAR-d30 (Visual AutoRegressive) | 1.73 | link |
| 5 | M-VAR-d32 | 1.78 | β |
| 6 | SiT-XL/2 | 2.06 | link |
| 7 | DiT-XL/2 | 2.27 | link |
| 8 | LDM-4 (Latent Diffusion) | 3.60 | link |
| 9 | ADM-G (Guided Diffusion) | 4.59 | link |