D4RL Offline RL (Gym-MuJoCo) d4rl-offline Leaderboard
Offline RL on the D4RL Gym-MuJoCo locomotion suite β average D4RL normalized score across the 6 standard datasets (halfcheetah / hopper / walker2d, medium + medium-expert). Normalized so 0 = random policy and 100 = expert. Β· Metric: Normalized Score (avg) (higher is better)
| # | Model | Normalized Score (avg) | Paper |
|---|---|---|---|
| 1 | EDAC | 92.92 | link |
| 2 | ReBRAC | 89.74 | link |
| 3 | SAC-N | 83.52 | link |
| 4 | IQL | 81.63 | link |
| 5 | CQL | 78.28 | link |
| 6 | TD3+BC | 76.45 | link |
| 7 | Decision Transformer | 73.84 | link |
| 8 | 10% BC | 69.29 | link |
| 9 | AWAC | 67.72 | link |
| 10 | BC (Behavior Cloning) | 50.40 | link |