基于深度学习的目标检测框架进展研究Research on Progress of Object Detection Framework Based on Deep Learning
寇大磊;权冀川;张仲伟;
摘要(Abstract):
在R-CNN框架提出后,基于深度学习的目标检测框架逐渐成为主流,可分为基于候选窗口和基于回归两类。近两年来,在Faster R-CNN、YOLO、SSD等经典的基于深度学习目标检测框架的基础上,出现了大量的优秀框架。根据优化方法对近几年提出的框架进行了梳理和总结。在PASCAL_VOC和MS COCO等主流测试集上对目标检测方法的性能及优缺点进行了对比分析。讨论了目标检测领域当前面临的困难与挑战,对可能的发展方向进行了展望。
关键词(KeyWords): 深度学习;目标检测;卷积神经网络;计算机视觉
基金项目(Foundation):
作者(Author): 寇大磊;权冀川;张仲伟;
Email:
DOI:
参考文献(References):
- [1]吴帅,徐勇,赵东宁.基于深度卷积网络的目标检测综述[J].模式识别与人工智能,2018,31(4):335-346.
- [2] Yang M,Kriegman D,Ahuja N.Detecting faces in images:a survey[J].IEEE TPAMI,2002,24(1):34-58.
- [3] Sun Z,Bebis G,Miller R.On road vehicle detection:a review[J].IEEE TPAMI,2006,28(5):694-711.
- [4] Enzweiler M,Gavrila D M.Monocular pedestrian detection:survey and experiments[J].IEEE TPAMI,2009,31(12):2179-2195.
- [5] Geronimo D,Lopez A M,Sappa A D,et al.Survey of pedestrian detection for advanced driver assistance systems[J].IEEE TPAMI,2010,32(7):1239-1258.
- [6] Dollar P,Wojek C,Schiele B,et al.Pedestrian detection:an evaluation of the state of the art[J].IEEE TPAMI,2012,34(4):743-761.
- [7] Zafeiriou S,Zhang C,Zhang Z.A survey on face detection in the wild:past,present and future[J].Computer Vision and Image Understanding,2015,138:1-24.
- [8] Ye Q,Doermann D.Text detection and recognition in imagery:a survey[J].IEEE TPAMI,2015,37(7):1480-1500.
- [9] Ponce J,Hebert M,Schmid C,et al.Toward category level object recognition[M].Berlin:Springer,2007.
- [10] Andreopoulos A,Tsotsos J.50 years of object recognition:directions forward[J].Computer Vision and Image Understanding,2013,117(8):827-891.
- [11] Zhang X,Yang Y,Han Z,et al.Object class detection:a survey[J].ACM Computing Surveys,2013,46(1).
- [12] Borji A,Cheng M,Jiang H,et al.Salient object detection:a survey[J].arXiv preprint arXiv:1411.5878,2014.
- [13]谢林江,季桂树,彭清,等.改进的卷积神经网络在行人检测中的应用[J].计算机科学与探索,2018,12(5):708-718.
- [14]龙敏,佟越洋.应用卷积神经网络的人脸活体检测算法研究[J].计算机科学与探索,2018,12(10):1658-1670.
- [15] Wu W,Yin Y,Wang X,et al.Face detection with different scales based on faster R-CNN[J].IEEE Transactions on Cybernetics,2018(99):1-12.
- [16]方路平,何杭江,周国民.目标检测框架研究综述[J].计算机工程与应用,2018,54(13):11-18.
- [17]郑伟成,李学伟,刘宏哲.基于深度学习的目标检测算法综述[J].计算机科学,2018,45(10A):6-8.
- [18] Liu Li,Ouyang W,Wang Xiaogang,et al.Deep learning for generic object detection:a survey[J].arXiv preprint arXiv:1809.02165,2018.
- [19]于进勇,丁鹏程,王超.卷积神经网络在目标检测中的应用综述[J].计算机科学,2018,45(11A):17-26.
- [20]姚群力,胡显,雷宏.深度卷积神经网络在目标检测中的研究进展[J].计算机工程与应用,2018,54(17):1-9.
- [21] Girshick R,Donahue J,Darrell T,et al.Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of IEEE Conference on Computer Vision and Pattern Recognition,2013:580-587.
- [22] Redmon J,Divvala S,Girshick R,et al.You only look once:unified,real-time object detectiono[C]//Proceedings of CVPR 2015,2015:779-788.
- [23] Liu W,Anguelov D,Erhan D,et al.SSD:single shot multibox detector[C]//Proceedings of European Conference on Computer Vision,2016:21-37.
- [24] He K,Zhang X,Ren S,et al.Spatial pyramid pooling in deep convolutional networks for visual recognition[J].IEEE Transactions on Pattern Analysis&Machine Intelligence,2015,37(9):1904-1916.
- [25] Girshick R.Fast R-CNN[C]//Proceedings of ICCV 2015,2015:1440-1448.
- [26] Ren S,He K,Girshick R,et al.Faster R-CNN:towards real-time object detection with region proposal networks[C]//Proceedings of International Conference on Neural Information Processing Systems,2015:91-99.
- [27] Uijlings J R R,Sande K E A V D,Gevers T,et al.Selective search for object recognition[J].International Journal of Computer Vision,2013,104(2):154-171.
- [28] Dai J,Li Y,He K,et al.R-FCN:object detection via region based fully convolutional networks[C]//Proceedings of NIPS 2016,2016.
- [29] He K,Zhang X,Ren S,et al.Deep residual learning for image recognition[C]//Proceedings of International Conference on Computer Vision and Pattern Recognition,2016:770-778.
- [30] Li Zeming,Peng Chao,Yu Gang,et al.Light-Head R-CNN:in defense of two-stage object detector[C]//Proceedings of CVPR 2017,2017.
- [31] He K,Gkioxari G,Dollár P,et al.Mask R-CNN[C]//Proceedings of ICCV 2017,2017.
- [32] Xie S,Girshick R,Dollar P,et al.Aggregated residual transformations for deep neural networks[C]//Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition,2016:5987-5995.
- [33] Redmon J,Farhadi A.YOLO9000:better,faster,stronger[C]//Proceedings of CVPR 2016,2016.
- [34] Redmon J,Farhadi A.YOLOv3:an incremental improvement[J].arXiv preprint arXiv:1804.02767,2018.
- [35] Szegedy C,Liu W,Jia Y,et al.Going deeper with convolutions[C]//Proceedings of CVPR 2015,2015:1-9.
- [36] Huang G,Liu Z,Maaten L V D,et al.Densely connected convolutional networks[C]//Proceedings of CVPR 2017,2017.
- [37] Hu J,Shen L,Sun G.Squeeze and excitation networks[C]//Proceedings of CVPR 2018,2018.
- [38] Iandola F N,Han S,Moskewicz M W,et al.SqueezeNet:AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size[C]//Proceedings of ICLR 2017,2017.
- [39] Howard A,Zhu M,Chen B,et al.Mobilenets:efficient convolutional neural networks for mobile vision applications[C]//Proceedings of CVPR 2017,2017.
- [40] Wang R J,Li X,Ling C X.Pelee:a real-time object detection system on mobile devices[C]//Proceedings of NIPS 2018,2018.
- [41] Bell S,Lawrence Z,Bala K,et al.Inside outside net:detecting objects in context with skip pooling and recurrent neural networks[C]//Proceedings of CVPR,2016:2874-2883.
- [42] Kong T,Yao A,Chen Y,et al.HyperNet:towards accurate region proposal generation and joint object detection[C]//IEEE International Conference on Computer Vision and Pattern Recognition,2016:845-853.
- [43] Fu C Y,Liu W,Ranga A,et al.DSSD:deconvolutional single shot detector[J].arXiv preprint arXiv:1701.06659,2017.
- [44] Lin T,Dollar P,Girshick R,et al.Feature pyramid networks for object detection[C]//Proceedings of CVPR2017,2017.
- [45] Shrivastava A,Sukthankar R,Malik J,et al.Beyond skip connections:top down modulation for object detection[C]//Proceedings of CVPR 2017,2017.
- [46] Zhou P,Ni B,Geng C,et al.Scale transferrable object detection[C]//Proceedings of CVPR 2018,2018.
- [47] Singh B,Najibi M,Davis L S.SNIPER:efficient multiscale training[C]//Advances in Neural Information Processing Systems,2018:9310-9320.
- [48] Li Yanghao,Chen Yuntao,Wang Naiyan,et al.Scale-aware trident networks for object detectiond[J].arXiv Preprint arXiv:1901.01892,2019.
- [49] Lin T,Goyal P,Girshick R,et al.Focal loss for dense object detection[C]//Proceedings of CVPR ICCV 2017,2017.
- [50] Ouyang Wanli,Wang Kun,Zhu Xin,et al.Chained cascade network for object detection[C]//Proceedings of ICCV 2017,2017.
- [51] Zhang S,Wen L,Bian X,et al.Single shot refinement neural network for object detection[C]//Proceedings of CVPR 2018,2018.
- [52] Law H,Deng J.CornerNet:detecting objects as paired keypoints[C]//Proceedings of ECCV 2018,2018.
- [53] Zhou Xingyi,Zhuo Jiacheng,Kr?henbühl P.Bottom-up object detection by grouping extreme and center points[J].arXiv Preprint arXiv:1901.08043v2,2019.
- [54] Shen Z,Liu Z,Li J,et al.DSOD:learning deeply supervised object detectors from scratch[C]//Proceedings of CVPR 2017,2017:1937-1945.
- [55] He Kaiming,Girshick R,Dollár P.Rethinking ImageNet pre-training[J].arXiv:1811.08883v1,2018.
- [56] Bodla N,Singh B,Chellappa R,et al.Soft-NMS—improving object detection with one line of code[C]//Proceedings of ICCV 2017,2017.
- [57] Dai Jifeng,Qi Haozhi,Xiong Yuwen,et al.Deformable convolutional networks[J].ar Xiv Preprint ar Xiv:1703.06211v3,2017.
- [58] Wang Xinglong,Shrivastava A.A-Fast-RCNN:hard positive generation via adversary for object detection[C]//Proceedings of CVPR 2017,2017:3039-3048.
- [59] Li Z,Peng C,Yu G,et al.DetNet:a backbone network for object detection[C]//Proceedings of ECCV 2018,2018.
- [60] Gao Mingfei,Yu Ruichi,Li Ang,et al.Dynamic zoom-in network for fast object detection in large images[C]//Proceedings of CVPR 2018,2018.