Abstract
arXiv:2407.03535v3 Announce Type: replace Abstract: Low-light videos often exhibit spatiotemporally incoherent noise, compromising visibility and degrading performance in computer vision applications. A major challenge for enhancing such content using deep learning lies in the scarcity of pixel-aligned, high-quality training data. We introduce BVI-RLV, a fully registered low-light video dataset comprising over 30k paired frames from 40 diverse scenes under two low-light conditions, each aligned with normal-light ground truth. Unlike existing datasets that rely on neutral density (ND) filters or suffer from misalignment issues, BVI-RLV achieves sub-pixel registration for 99.24% of data at full HD resolution across dynamic motion scenarios using a motorized dolly and image-based refinement. The dataset covers a wide range of motion types and realistic temporal noise. We also provide baseline implementations using four representative architectures: Convolutional Neural Network (CNN), Transformer, State Space Model (Mamba), and Diffusion Model (DM). Experiments demonstrate that registration is crucial for supervised learning, yielding up to 5.85 dB PSNR improvement compared to unregistered training. Models trained on BVI-RLV outperform those trained on existing datasets in cross-dataset evaluations, achieving superior performance even in real-world outdoor scenes. Our dataset is publicly available at https://doi.org/10.21227/mzny-8c77.