FlowerTune LLM — General NLP flowertune-general-nlp Leaderboard

FlowerTune cross-domain benchmark for federated fine-tuning of LLMs, General NLP track. Teams federally fine-tune a base model (<=13B params, <=200GB communication budget) with LoRA across decentralized clients; Avg Score aggregates MMLU-style accuracy over STEM, Social Sciences, and Humanities. The first standardized, protocol-consistent leaderboard for federated LLM fine-tuning. · Metric: Avg Score (higher is better)

Source ↗

#	Model	Avg Score	Paper
1	ZeroOne.AI — Internlm3-8b-instruct	69.25	—
2	Gachon Cognitive Computing Lab — Internlm3-8b-instruct	69.19	—
3	T-IoI@UNR — Gemma2-9b-cpt-sahabatai-v1-instruct	67.78	—
4	FL-finetune-JB-DC — Qwen2.5-7B-Instruct	67.71	—
5	Gachon Cognitive Computing Lab — Gemma2-9B-instruct	64.84	—
6	ZJUDAI — Qwen2.5-7B-Instruct	64.04	—
7	Massimo R. Scamarcia — Phi-4	55.64	—
8	ZJUDAI — Qwen2.5-1.5B-Instruct	53.32	—
9	Alessandro Pinto — Qwen2.5-1.5B-Instruct	52.77	—