FlowerTune LLM β General NLP flowertune-general-nlp Leaderboard
FlowerTune cross-domain benchmark for federated fine-tuning of LLMs, General NLP track. Teams federally fine-tune a base model (<=13B params, <=200GB communication budget) with LoRA across decentralized clients; Avg Score aggregates MMLU-style accuracy over STEM, Social Sciences, and Humanities. The first standardized, protocol-consistent leaderboard for federated LLM fine-tuning. Β· Metric: Avg Score (higher is better)
| # | Model | Avg Score | Paper |
|---|---|---|---|
| 1 | ZeroOne.AI β Internlm3-8b-instruct | 69.25 | β |
| 2 | Gachon Cognitive Computing Lab β Internlm3-8b-instruct | 69.19 | β |
| 3 | T-IoI@UNR β Gemma2-9b-cpt-sahabatai-v1-instruct | 67.78 | β |
| 4 | FL-finetune-JB-DC β Qwen2.5-7B-Instruct | 67.71 | β |
| 5 | Gachon Cognitive Computing Lab β Gemma2-9B-instruct | 64.84 | β |
| 6 | ZJUDAI β Qwen2.5-7B-Instruct | 64.04 | β |
| 7 | Massimo R. Scamarcia β Phi-4 | 55.64 | β |
| 8 | ZJUDAI β Qwen2.5-1.5B-Instruct | 53.32 | β |
| 9 | Alessandro Pinto β Qwen2.5-1.5B-Instruct | 52.77 | β |