BrowseComp
Emerging5papers using it
2026first seen
BrowseComp is a benchmark used to evaluate the performance of models in managing context during multi-round interactions.
Papers using BrowseComp (5)
- Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active ParametersAutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature DiscoveryOpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty TrajectoriesTreeSeeker: Tree-Structured Trial, Error, and Return in Deep SearchEvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge