CALVIN (Long-Horizon Manipulation) calvin Leaderboard
CALVIN long-horizon language-conditioned manipulation β average number of consecutive tasks completed (0β5) in the 1000-chain evaluation. Higher means longer successful task chains. Β· Metric: Avg Sequence Length (higher is better)
| # | Model | Avg Sequence Length | Paper |
|---|---|---|---|
| 1 | Xiaomi-Robotics-0 | 4.75 | link |
| 2 | VITA | 4.73 | link |
| 3 | NS-VLA | 4.72 | link |
| 4 | AVA-VLA | 4.65 | link |
| 5 | UD-VLA | 4.64 | link |
| 6 | OASIS | 4.57 | link |
| 7 | FLOWER + PTR | 4.56 | link |
| 8 | MoLA | 4.55 | link |
| 9 | ThinkProprio | 4.55 | link |
| 10 | FLOWER (from PTR) | 4.54 | link |
| 11 | UD-VLA + Fast-dVLA | 4.54 | link |
| 12 | FLOWER | 4.53 | link |
| 13 | MDT (from UD-VLA + Fast-dVLA) | 4.52 | link |
| 14 | DeFI | 4.51 | link |
| 15 | MoDE + GraspCorrect | 4.50 | link |
| 16 | VLA-Adapter (from NS-VLA) | 4.50 | link |
| 17 | RoboVLMs (from ELLSA) | 4.49 | link |
| 18 | SDP | 4.49 | link |
| 19 | HARP-VLA | 4.48 | link |
| 20 | Being-H0.7 | 4.48 | link |
| 21 | FOFPred | 4.48 | link |
| 22 | PALM | 4.48 | link |
| 23 | NIAF | 4.47 | link |
| 24 | VPP + TapSampling | 4.46 | link |
| 25 | VLA-IAP (70%) + DreamVLA | 4.45 | link |
| 26 | DFM-VLA+Embed | 4.44 | link |
| 27 | DreamVLA | 4.44 | link |
| 28 | FLOWER (from ThinkProprio) | 4.44 | link |
| 29 | VLA-IAP (50%) + DreamVLA | 4.44 | link |