MuJoCo HalfCheetah (Online Control) mujoco-halfcheetah Leaderboard
Online continuous control on Gym/MuJoCo HalfCheetah-v4 β average episodic return after 1M environment steps. The canonical locomotion task for benchmarking continuous-control RL. (Cross-paper numbers use slightly different eval protocols; see per-row notes.) Β· Metric: Average Return (higher is better)
| # | Model | Average Return | Paper |
|---|---|---|---|
| 1 | TD7 (For SALE) | 17433.00 | link |
| 2 | DDPG (CleanRL) | 10374.07 | link |
| 3 | SAC (CleanRL) | 9634.89 | link |
| 4 | TD3 (CleanRL) | 9583.22 | link |
| 5 | PPO (CleanRL) | 1442.64 | link |