Qwen-3-8B-Base
Emerging4papers using it
2025first seen
The 'Qwen3-8B-Base' dataset/benchmark is used to evaluate reinforcement learning-based post-training methods for large language models, focusing on the statistical properties of policy-gradient estimators and their optimization algorithms.