← all datasets

Qwen-3-8B-Base

Emerging
4papers using it
2025first seen

The 'Qwen3-8B-Base' dataset/benchmark is used to evaluate reinforcement learning-based post-training methods for large language models, focusing on the statistical properties of policy-gradient estimators and their optimization algorithms.

Papers using Qwen-3-8B-Base (4)

Qwen-3-8B-Base β€” datasets β€” reinforcement-learning