← all datasets

Qwen-3-1.7B

Emerging
3papers using it
2025first seen

'Qwen-3-1.7B' is a benchmark used to evaluate the performance of models on reasoning tasks, specifically assessing their ability to optimize response quality and robustness through the application of the ADPO method.

Papers using Qwen-3-1.7B (3)

Qwen-3-1.7B β€” datasets β€” reinforcement-learning