← all papers Β· overview

Multi-Stage VLM Pipeline for Zero-Shot Traffic Accident Understanding

Abstract

arXiv:2605.29325v1 Announce Type: new Abstract: We present the 1st-place solution to the ACCIDENT challenge at the CVPR 2026 AUTOPILOT Workshop, which asks for zero-shot prediction of accident timing, impact centroid, and collision type from CCTV footage. On a frozen Qwen3-VL-32B-Instruct checkpoint we build a three-stage pipeline (full-video joint prediction, time refinement, and single-frame grounding of the impact centroid), run the same pipeline a second time on a 235B Mixture-of-Experts sibling, blend the two outputs 9:1, and finally snap each predicted point onto the nearest vehicle detection. The final system reaches Public LB 0.55469 / Private LB 0.57080, roughly +0.21 over the strongest host baseline (Molmo-7B, 0.358) and wins the challenge. We ablate each component, report the negative results that shaped the final design, and release the code at https://github.com/fuumin621/cvpr2026-accident-1st-place-solution.

Code

Related papers