Awesome Object Detection
Object Detection is one of the most active areas in Awesome Computer Vision β 2,782 papers in this collection, evaluated on datasets like COCO, ImageNet, Pascal VOC. A strong starting point is "Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows".
Datasets & benchmarks
Key papers
- Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows (2021)Ze Liu, Yutong Lin, Yue Cao, et al.38.40
- End-to-end Object Detection With Transformers (2020)Nicolas Carion, Francisco Massa, Gabriel Synnaeve, et al.38.37
- Pyramid Vision Transformer: A Versatile Backbone For Dense Prediction Without Convolutions (2021)Wenhai Wang, Enze Xie, Xiang Li, et al.33.76
- Fairmot: On The Fairness Of Detection And Re-identification In Multiple Object Tracking (2020)Yifu Zhang, Chunyu Wang, Xinggang Wang, et al.31.02
- Focal Loss For Dense Object Detection (2017)Tsung-Yi Lin, Priya Goyal, Ross Girshick, et al.30.00
- Region Proposal By Guided Anchoring (2019)Jiaqi Wang, Kai Chen, Shuo Yang, et al.29.56
- Unified Perceptual Parsing For Scene Understanding (2018)Tete Xiao, Yingcheng Liu, Bolei Zhou, et al.29.22
- Towards Real-time Multi-object Tracking (2019)Zhongdao Wang, Liang Zheng, Yixuan Liu, et al.28.64
- Real-time Scene Text Detection With Differentiable Binarization (2019)Minghui Liao, Zhaoyi Wan, Cong Yao, et al.28.03
- Reppoints: Point Set Representation For Object Detection (2019)Ze Yang, Shaohui Liu, Han Hu, et al.27.99
- Higherhrnet: Scale-aware Representation Learning For Bottom-up Human Pose Estimation (2019)Bowen Cheng, Bin Xiao, Jingdong Wang, et al.27.97
- Rotate To Attend: Convolutional Triplet Attention Module (2020)Diganta Misra, Trikay Nalamada, Ajay Uppili Arasanipalai, et al.27.62
- M2det: A Single-shot Object Detector Based On Multi-level Feature Pyramid Network (2018)Qijie Zhao, Tao Sheng, Yongtao Wang, et al.27.18
- A Survey On Visual Transformer (2020)Kai Han, Yunhe Wang, Hanting Chen, et al.26.80
- Detrs Beat Yolos On Real-time Object Detection (2023)Yian Zhao, Wenyu Lv, Shangliang Xu, et al.26.09
- Few-shot Object Detection With Attention-rpn And Multi-relation Detector (2019)Qi Fan, Wei Zhuo, Chi-Keung Tang, et al.25.47
- Involution: Inverting The Inherence Of Convolution For Visual Recognition (2021)Duo Li, Jie Hu, Changhu Wang, et al.25.47
- Ccnet: Criss-cross Attention For Semantic Segmentation (2018)Zilong Huang, Xinggang Wang, Yunchao Wei, et al.25.44
- Querydet: Cascaded Sparse Query For Accelerating High-resolution Small Object Detection (2021)Chenhongyi Yang, Zehao Huang, Naiyan Wang25.25
- SCAN: Learning To Classify Images Without Labels (2020)Wouter van Gansbeke, Simon Vandenhende, Stamatios Georgoulis, et al.25.15
- Side-aware Boundary Localization For More Precise Object Detection (2019)Jiaqi Wang, Wenwei Zhang, Yuhang Cao, et al.24.91
- Rethinking RGB-D Salient Object Detection: Models, Data Sets, And Large-scale Benchmarks (2019)Deng-Ping Fan, Zheng Lin, Jia-Xing Zhao, et al.24.88
- Detect-to-retrieve: Efficient Regional Aggregation For Image Search (2018)Marvin Teichmann, Andre Araujo, Menglong Zhu, et al.24.71
- Scale Match For Tiny Person Detection (2019)Xuehui Yu, Yuqi Gong, Nan Jiang, et al.24.55
- Structure-measure: A New Way To Evaluate Foreground Maps (2017)Deng-Ping Fan, Ming-Ming Cheng, Yun Liu, et al.24.07
- Rethinking The Competition Between Detection And Reid In Multi-object Tracking (2020)Chao Liang, Zhipeng Zhang, Xue Zhou, et al.24.06
- NAS-FPN: Learning Scalable Feature Pyramid Architecture For Object Detection (2019)Golnaz Ghiasi, Tsung-Yi Lin, Ruoming Pang, et al.24.01
- Grounding DINO: Marrying DINO With Grounded Pre-training For Open-set Object Detection (2023)Shilong Liu, Zhaoyang Zeng, Tianhe Ren, et al.23.46
- TOOD: Task-aligned One-stage Object Detection (2021)Chengjian Feng, Yujie Zhong, Yu Gao, et al.23.11
- Objects Are Different: Flexible Monocular 3D Object Detection (2021)Yunpeng Zhang, Jiwen Lu, Jie Zhou23.03
- Panoptic Feature Pyramid Networks (2019)Alexander Kirillov, Ross Girshick, Kaiming He, et al.22.98
- Dual-level Collaborative Transformer For Image Captioning (2021)Yunpeng Luo, Jiayi Ji, Xiaoshuai Sun, et al.22.95
- Bottleneck Transformers For Visual Recognition (2021)Aravind Srinivas, Tsung-Yi Lin, Niki Parmar, et al.22.72
- Weakly-supervised Salient Object Detection Via Scribble Annotations (2020)Jing Zhang, Xin Yu, Aixuan Li, et al.22.59
- Multi-scale Vision Longformer: A New Vision Transformer For High-resolution Image Encoding (2021)Pengchuan Zhang, Xiyang Dai, Jianwei Yang, et al.22.55
- Egnet:edge Guidance Network For Salient Object Detection (2019)Jia-Xing Zhao, Jiangjiang Liu, Den-Ping Fan, et al.22.41
- Tubetk: Adopting Tubes To Track Multi-object In A One-step Training Model (2020)Bo Pang, Yizhuo Li, Yifan Zhang, et al.22.36
- Contournet: Taking A Further Step Toward Accurate Arbitrary-shaped Scene Text Detection (2020)Yuxin Wang, Hongtao Xie, Zhengjun Zha, et al.22.25
- Simple Copy-paste Is A Strong Data Augmentation Method For Instance Segmentation (2020)Golnaz Ghiasi, Yin Cui, Aravind Srinivas, et al.22.19
- Mask Textspotter V3: Segmentation Proposal Network For Robust Scene Text Spotting (2020)Minghui Liao, Guan Pang, Jing Huang, et al.22.16
- Gliding Vertex On The Horizontal Bounding Box For Multi-oriented Object Detection (2019)Yongchao Xu, Mingtao Fu, Qimeng Wang, et al.22.09
- Distribution Alignment: A Unified Framework For Long-tail Visual Recognition (2021)Songyang Zhang, Zeming Li, Shipeng Yan, et al.22.08
- Deep Affinity Network For Multiple Object Tracking (2018)Shijie Sun, Naveed Akhtar, Huansheng Song, et al.22.06
- Tracking Without Bells And Whistles (2019)Philipp Bergmann, Tim Meinhardt, Laura Leal-Taixe22.04
- Detectors: Detecting Objects With Recursive Feature Pyramid And Switchable Atrous Convolution (2020)Siyuan Qiao, Liang-Chieh Chen, Alan Yuille22.04
- Hand Keypoint Detection In Single Images Using Multiview Bootstrapping (2017)Tomas Simon, Hanbyul Joo, Iain Matthews, et al.22.00
- Pose-guided Visible Part Matching For Occluded Person Reid (2020)Shang Gao, Jingya Wang, Huchuan Lu, et al.21.99
- Dynamic Head: Unifying Object Detection Heads With Attentions (2021)Xiyang Dai, Yinpeng Chen, Bin Xiao, et al.21.96
- Bottom-up Object Detection By Grouping Extreme And Center Points (2019)Xingyi Zhou, Jiacheng Zhuo, Philipp KrΓ€henbΓΌhl21.92
- Feature Selective Anchor-free Module For Single-shot Object Detection (2019)Chenchen Zhu, Yihui He, Marios Savvides21.85
- Detection In Crowded Scenes: One Proposal, Multiple Predictions (2020)Xuangeng Chu, Anlin Zheng, Xiangyu Zhang, et al.21.83
- Vision Transformer With Deformable Attention (2022)Zhuofan Xia, Xuran Pan, Shiji Song, et al.21.78
- Pixel-in-pixel Net: Towards Efficient Facial Landmark Detection In The Wild (2020)Haibo Jin, Shengcai Liao, Ling Shao21.70
- Distilling Object Detectors Via Decoupled Features (2021)Jianyuan Guo, Kai Han, Yunhe Wang, et al.21.66
- Hierarchical Dynamic Filtering Network For RGB-D Salient Object Detection (2020)Youwei Pang, Lihe Zhang, Xiaoqi Zhao, et al.21.50
- Vinvl: Revisiting Visual Representations In Vision-language Models (2021)Pengchuan Zhang, Xiujun Li, Xiaowei Hu, et al.21.40
- RTMO: Towards High-performance One-stage Real-time Multi-person Pose Estimation (2023)Peng Lu, Tao Jiang, Yining Li, et al.21.36
- Cross-x Learning For Fine-grained Visual Categorization (2019)Wei Luo, Xitong Yang, Xianjie Mo, et al.21.36
- EDN: Salient Object Detection Via Extremely-downsampled Network (2020)Yu-Huan Wu, Yun Liu, Le Zhang, et al.21.36
- YOLO-MS: Rethinking Multi-scale Representation Learning For Real-time Object Detection (2023)Yuming Chen, Xinbin Yuan, Jiabao Wang, et al.21.21