Abstract
This study systematically conducted an extensive real-world evaluation of all configurations of You Only Look Once (YOLO)-based object detection algorithms, including YOLOv8, YOLOv9, YOLOv10, YOLO11, and YOLOv12. Models were assessed using precision, recall, mean Average Precision at 50 % Intersection over Union (mAP@50), and computational efficiency across pre-processing, inference, and post-processing stages for detecting immature green fruitlets in commercial orchards. Field-level fruitlet counting was also validated using images captured with both Intel RealSense and iPhone 14 Pro Max sensors. YOLOv12l achieved the highest recall (0.900), while YOLOv10x and YOLOv9 GELAN-c reported the top precision scores of 0.908 and 0.903, respectively. YOLOv9 GELAN-base and GELAN-e achieved the highest mAP@50 (0.935), followed by YOLO11s (0.933) and YOLOv12l (0.931). In counting validation, YOLO11n demonstrated superior accuracy, with RMSE values of 4.51-4.96 and MAE values of 3.85-7.73 across four apple varieties. Sensor-specific training on Intel RealSense further improved detection performance. YOLO11n also recorded the fastest inference speed (2.4 ms), outperforming YOLOv8n, YOLOv9 GELAN-s, YOLOv10n, and YOLOv12n, affirming its suitability for real-time orchard applications.