Abstract
The exponential growth of machine learning submissions has strained the traditional peer review process, resulting in slow feedback loops for authors and an immense burden on reviewers to rigorously audit technical soundness and verify literature. To address this, we introduce ScholarPeer, a multi-agent framework designed to operationalize the rigorous auditing workflow of a senior researcher. Rather than attempting to replace human judgment, ScholarPeer serves as a co-scientist: acting as a mentor for rapid author iteration prior to submission, and as an active verification assistant that augments human reviewers. The framework structurally decouples contextualization from critique by deploying a sub-domain historian to synthesize the field's trajectory, a baseline scout to proactively hunt for omitted state-of-the-art comparisons, and a multi-aspect Q&A engine that deeply audits technical soundness-scrutinizing internal logical consistency, experimental validity, and mathematical rigor-while cross-referencing claims against top-tier academic venues. We comprehensively evaluate ScholarPeer on ~1,800 ICLR submissions spanning 2020 through 2025. Our results show that ScholarPeer achieves significant win-rates against state-of-the-art fine-tuned models and search-augmented agentic baselines.