#ModelscorePaper
1Rethinking UMM Visual Generation: Masked Modeling for Efficient Image-Only Pre-training0.89β€”
2MammothModa2: A Unified AR-Diffusion Framework for Multimodal Understanding and Generation0.87β€”
3UniDDT: Unifying Multimodal Understanding and Generation with Decoupled Diffusion Transformer0.87β€”
GenEval geneval Leaderboard