南方科技大学知识苑(SUSTech KC): 面向边缘设备的高精度毫秒级人脸检测技术研究

题名	面向边缘设备的高精度毫秒级人脸检测技术研究
其他题名	Research on High-Precision Millisecond-Level Face Detection Technology for Edge Devices
姓名	吴伟
姓名拼音	WU Wei
学号	12032501
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	于仕琪
导师单位	计算机科学与工程系
论文答辩日期	2023-05-13
论文提交日期	2023-06-29
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	随着物联网技术的不断发展和普及，人脸检测作为一种重要的计算机视觉任务，在物联网应用中发挥着越来越重要的作用，是实现人与边缘设备智能交互的关键技术之一。然而，边缘设备上的人脸检测面临着诸多挑战，不仅需要适应低功耗、低性能、低存储等硬件条件的限制，还需要在保证高精度、高速度、高鲁棒性等性能指标的同时，适应各种复杂多变的场景和需求。为了解决这一难题，本文提出了一种超轻量级人脸检测算法YuNet。本文提出轻量化模型设计原则，手动设计了主干网络、颈部网络和检测头，极大地降低了模型参数量和计算量，在保证模型效率和高准确率的同时，也提高了模型的简洁性和可控性。YuNet还创新地采用了无锚框机制、自适应标签匹配策略和样本均衡的数据增强策略三个技术点，使得模型在毫秒级的检测速度下仍能保持高精度，并有效地解决了小目标检测、目标分布不平衡等问题。YuNet不仅全面优于传统的基于手工特征的人脸检测算法，而且在同精度水平下与其他基于深度学习的轻量级人脸检测器相比，在参数量比较上具有数量级上的优势，并且在速度方面也有明显优势。据已知知识范围内，本文提出的YuNet几乎是参数量最少却仍保持高精度且速度最快的人脸检测器之一。本文还基于YuNet开发了libfacedetection开源项目，使用C++实现且不依赖任何第三方库，在多个指令集平台(AVX2/AVX512/NEON)上进行了代码层面的SIMD指令优化加速，并实现了快速部署。实验结果表明，YuNet在边缘设备上具有显著优势，并能够支持多种应用场景。
其他摘要	With the development of Internet of Things technology, face detection plays an increasingly important role in IoT applications and is a key technology for human-edge device interaction. However, face detection on edge devices faces hardware constraints such as low power consumption, low performance, and low storage, while also requiring performance indicators such as high accuracy, high speed, and high robustness. To solve this problem, this paper proposes YuNet, a super-lightweight face detection algorithm. Based on the lightweight model design principles proposed in this paper, we manually design YuNet’s backbone network, neck network and detection head to greatly reduce model parameter size and computation. For models with extremely small parameter size and computation, we innovatively propose anchor-free mechanism and adaptive label matching strategy as well as sample-balanced data augmentation strategy to achieve high accuracy while maintaining millisecond-level detection speed. YuNet not only outperforms traditional face detection algorithms based on handcrafted features but also has an order of magnitude advantage in parameter size over other lightweight face detection networks with the same accuracy based on deep learning. To our knowledge, our YuNet is almost the smallest parameter size yet still maintains high accuracy face detector. In this paper we also develop libfacedetection open source project based on YuNet which is implemented in C++ without relying on any third-party libraries performs code-level SIMD instruction optimization acceleration on multiple instruction set platforms and achieves fast deployment. Experimental results show that YuNet has significant advantages on edge devices and can support various application scenarios.
关键词	人脸检测轻量级边缘设备计算机视觉
其他关键词	Face Detection Lightweight Edge Device Computer Vision
语种	中文
培养类别	独立培养
入学年份	2020
学位授予年份	2023-06
参考文献列表	[1] YANG M H, KRIEGMAN D, AHUJA N. Detecting faces in images: a survey[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(1): 34-58. DOI: 10.1109/34.982883. [2] YANG G, HUANG T S. Human face detection in a complex background[J/OL]. Pattern Recognition, 1994, 27(1): 53-63. DOI: 10.1016/0031-3203(94)90017-5. [3] LEUNG T, BURL M, PERONA P. Finding faces in cluttered scenes using random labeled graph matching[C/OL]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 1995: 637-644. DOI: 10.1109/iccv.1995.466878. [4] YOW K C, CIPOLLA R. Feature-based human face detection[J/OL]. Image and Vision Computing, 1997, 15(9): 713-735. DOI: 10.1016/S0262-8856(97)00003-6. [5] MCKENNA S J, GONG S, RAJA Y. Modelling facial colour and identity with gaussian mixtures[J/OL]. Pattern Recognition, 1998, 31(12): 1883-1892. DOI: 10.1016/S0031-3203(98)00066-1. [6] KJELDSEN R, KENDER J. Finding skin in color images[C/OL]//Proceedings of the Second International Conference on Automatic Face and Gesture Recognition. 1996: 312-317. DOI:10.1109/AFGR.1996.557283. [7] CRAW I, TOCK D, BENNETT A. Finding face features[C/OL]//Computer Vision —ECCV’92. Springer Berlin Heidelberg, 1992: 92-96. DOI: 10.1007/3-540-55426-2_12. [8] LANITIS A, TAYLOR C, COOTES T. Automatic face identification system using flexibleappearance models[J/OL]. Image and Vision Computing, 1995, 13(5): 393-401. DOI: 10.1016/0262-8856(95)99726-H. [9] TURK M, PENTLAND A. Eigenfaces for recognition[J/OL]. Journal of Cognitive Neuroscience, 1991, 3(1): 71-86. DOI: 10.1162/jocn.1991.3.1.71. [10] SUNG K K, POGGIO T. Example-based learning for view-based human face detection[J/OL]. IEEE Transactions on pattern analysis and machine intelligence, 1998, 20(1): 39-51. DOI:10.1109/34.655648. [11] ROWLEY H, BALUJA S, KANADE T. Neural network-based face detection[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(1): 23-38. DOI: 10.1109/34.655647. [12] OSUNA E, FREUND R, GIROSIT F. Training support vector machines: an application to face detection[C/OL]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1997: 130-136. DOI: 10.1109/cvpr.1997.609310. [13] VIOLA P A, JONES M J. Rapid object detection using a boosted cascade of simple features[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2001: 511-518. DOI: 10.1109/cvpr.2001.990517. [14] CROW F C. Summed-area tables for texture mapping[J/OL]. SIGGRAPH Comput. Graph.,1984, 18(3): 207–212. DOI: 10.1145/964965.808600. [15] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J/OL]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. DOI: 10.1006/jcss.1997.1504. [16] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with discriminatively trained part-based models[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645. DOI: 10.1109/TPAMI.2009.167. [17] FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial structures for object recognition[J/OL]. International journal of computer vision, 2005, 61: 55-79. DOI: 10.1023/B:VISI.0000042934.15159.49. [18] FENG Y, YU S, PENG H, et al. Detect faces efficiently: a survey and evaluations[J/OL]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2022, 4(1): 1-18. DOI: 10.1109/TBIOM.2021.3120412. [19] REN S, HE K, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031. [20] LI H, LIN Z, SHEN X, et al. A convolutional neural network cascade for face detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2015: 5325-5334. DOI: 10.1109/cvpr.2015.7299170. [21] ZHANG K, ZHANG Z, LI Z, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J/OL]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503. DOI:10.1109/LSP.2016.2603342. [22] WANG H, LI Z, JI X, et al. Face r-cnn[M/OL]. arXiv, 2017. DOI: 10.48550/arXiv.1706.01061. [23] HU P, RAMANAN D. Finding tiny faces[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2017: 951-959. DOI: 10.1109/cvpr.2017.166. [24] NAJIBI M, SAMANGOUEI P, CHELLAPPA R, et al. Ssh: single stage headless face detector[C/OL]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2017:4875-4884. DOI: 10.1109/iccv.2017.522. [25] ZHANG S, ZHU X, LEI Z, et al. S3fd: single shot scale-invariant face detector[C/OL]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 192-201. DOI:10.1109/iccv.2017.30. [26] TANG X, DU D K, HE Z, et al. Pyramidbox: a context-assisted single shot face detector[C/OL]//Proceedings of the European Conference on Computer Vision. 2018: 797-813. DOI:10.1007/978-3-030-01240-3_49. [27] LI J, WANG Y, WANG C, et al. Dsfd: dual shot face detector[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 5060-5069. DOI:10.1109/cvpr.2019.00520. [28] DENG J, GUO J, VERVERAS E, et al. Retinaface: single-shot multi-level face localisation in the wild[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. DOI: 10.1109/cvpr42600.2020.00525. [29] LIU Y, TANG X, HAN J, et al. Hambox: delving into mining high-quality anchors on face detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 13043-13051. DOI: 10.1109/cvpr42600.2020.01306. [30] LIU Y, TANG X. Bfbox: searching face-appropriate backbone and feature pyramid network for face detector[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 13568-13577. DOI: 10.1109/cvpr42600.2020.01358. [31] LI J, ZHANG B, WANG Y, et al. Asfd: automatic and scalable face detector[C/OL]//Proceedings of the ACM International Conference on Multimedia. Association for Computing Machinery, 2021: 2139–2147. DOI: 10.1145/3474085.3475372. [32] ZHANG S, ZHU X, LEI Z, et al. Faceboxes: a cpu real-time face detector with high accuracy[C/OL]//2017 IEEE International Joint Conference on Biometrics (IJCB). 2017: 1-9. DOI:10.1109/BTAS.2017.8272675. [33] HE Y, XU D, WU L, et al. LFFD: a light and fast face detector for edge devices[M/OL]. arXiv,2019. DOI: 10.48550/arXiv.1904.10633. [34] QI D, TAN W, YAO Q, et al. Yolo5face: why reinventing a face detector[C]//Proceedings of the European Conference on Computer Vision, Workshops. Springer Nature Switzerland, 2023:228-244. [35] JOCHER G. Yolov5[J/OL]. GitHub repository, 2020. https://github.com/ultralytics/yolov5. [36] GUO J, DENG J, LATTAS A, et al. Sample and computation redistribution for efficient face detection[C]//International Conference on Learning Representations. OpenReview.net, 2022. [37] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J/OL]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. DOI: 10.1109/5.726791. [38] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[C]//Advances in Neural Information Processing Systems. 2012: 1106-1114. [39] DENG J, DONG W, SOCHER R, et al. Imagenet: a large-scale hierarchical image database[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2009: 248-255. DOI: 10.1109/cvpr.2009.5206848. [40] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[M/OL]. arXiv, 2014. DOI: 10.48550/arXiv.1409.1556. [41] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2016.DOI: 10.1109/cvpr.2016.90. [42] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[M/OL]. arXiv, 2017. DOI: 10.48550/arXiv.1704.04861. [43] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: inverted residuals and linear bottlenecks[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 4510-4520. DOI: 10.1109/cvpr.2018.00474. [44] HOWARD A, SANDLER M, CHU G, et al. Searching for mobilenetv3[C/OL]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1314-1324. DOI:10.1109/iccv.2019.00140. [45] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2017:1251-1258. DOI: 10.1109/cvpr.2017.195. [46] XU B, WANG N, CHEN T, et al. Empirical evaluation of rectified activations in convolutional network[M/OL]. arXiv, 2015. DOI: 10.48550/arXiv.1505.00853. [47] JARRETT K, KAVUKCUOGLU K, RANZATO M, et al. What is the best multi-stage architecture for object recognition?[C/OL]//2009 IEEE 12th International Conference on Computer Vision. 2009: 2146-2153. DOI: 10.1109/iccv.2009.5459469. [48] NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines[C]//Proceedings of the International Conference on International Conference on Machine Learning. Omnipress, 2010: 807–814. [49] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning: volume 37. JMLR.org, 2015: 448-456. [50] GIRSHICK R B. Fast R-CNN[C/OL]//Proceedings of the IEEE International Conference on Computer Vision. IEEE Computer Society, 2015: 1440-1448. DOI: 10.1109/iccv.2015.169. [51] YU J, JIANG Y, WANG Z, et al. Unitbox: an advanced object detection network[C/OL]//Proceedings of the ACM International Conference on Multimedia. Association for Computing Machinery, 2016: 516–520. DOI: 10.1145/2964284.2967274. [52] PENG H, YU S. A systematic iou-related method: beyond simplified regression for better localization[J/OL]. IEEE Transactions on Image Processing, 2021, 30: 5032-5044. DOI: 10.1109/TIP.2021.3077144. [53] JAIN V, LEARNED-MILLER E. Fddb: a benchmark for face detection in unconstrained settings: UM-CS-2010-009[R]. University of Massachusetts, Amherst, 2010. [54] YANG S, LUO P, LOY C C, et al. WIDER FACE: A face detection benchmark[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016: 5525-5533. DOI: 10.1109/cvpr.2016.596. [55] LIU Y, WANG F, DENG J, et al. Mogface: towards a deeper appreciation on face detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 4093-4102. DOI: 10.1109/cvpr52688.2022.00406. [56] ZHU Y, CAI H, ZHANG S, et al. Tinaface: Strong but simple baseline for face detection[M/OL]. arXiv, 2020. DOI: 10.48550/arXiv.2011.13183. [57] ZHU C, ZHENG Y, LUU K, et al. Cms-rcnn: contextual multi-scale region-based cnn for unconstrained face detection[J/OL]. Deep learning for biometrics, 2017: 57-79. DOI: 10.1007/978-3-319-61657-5_3. [58] WANG Y, JI X, ZHOU Z, et al. Detecting faces using region-based fully convolutional networks[M/OL]. arXiv, 2017. DOI: 10.48550/arXiv.1709.05256. [59] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2017. DOI: 10.1109/cvpr.2017.106. [60] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:8759-8768. DOI: 10.1109/cvpr.2018.00913. [61] GHIASI G, LIN T Y, LE Q V. Nas-fpn: learning scalable feature pyramid architecture for object detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7036-7045. DOI: 10.1109/cvpr.2019.00720. [62] GE Z, LIU S, WANG F, et al. Yolox: exceeding yolo series in 2021[M/OL]. arXiv, 2021. DOI:10.48550/arXiv.2107.08430. [63] TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C/OL]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019:9627-9636. DOI: 10.1109/iccv.2019.00972. [64] ZHANG S, CHI C, YAO Y, et al. Bridging the gap between anchor-based and anchor-Free detection via adaptive training sample selection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 2020:9756-9765. DOI: 10.1109/cvpr42600.2020.00978. [65] MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: practical guidelines for efficient cnn architecture design[C/OL]//Proceedings of the European Conference on Computer Vision. Springer, 2018. DOI: 10.1007/978-3-030-01264-9_8. [66] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. Squeezenet: alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size[M/OL]. arXiv, 2016. DOI: 10.48550/arXiv.1602.07360. [67] REDMON J, FARHADI A. Yolov3: an incremental improvement[M/OL]. arXiv, 2018. DOI:10.48550/arXiv.1804.02767. [68] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C/OL]//Proceedings of the European Conference on Computer Vision. Springer, 2016: 21-37. DOI:10.1007/978-3-319-46448-0_2. [69] ZHOU X, WANG D, KRäHENBüHL P. Objects as points[M/OL]. arXiv, 2019. DOI: 10.48550/arXiv.1904.07850. [70] LAW H, DENG J. Cornernet: detecting objects as paired keypoints[C/OL]//Proceedings of the European Conference on Computer Vision. 2018. DOI: 10.1007/978-3-030-01264-9_45. [71] SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016. DOI: 10.1109/cvpr.2016.89. [72] GE Z, LIU S, LI Z, et al. Ota: optimal transport assignment for object detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 2021: 303-312. DOI: 10.1109/cvpr46437.2021.00037. [73] ZHANG X, WAN F, LIU C, et al. Freeanchor: learning to match anchors for visual object detection[C]//Advances in Neural Information Processing Systems. 2019: 147-155. [74] CUBUK E D, ZOPH B, MANE D, et al. Autoaugment: learning augmentation strategies from data[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. DOI: 10.1109/CVPR.2019.00020. [75] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[M/OL]. arXiv, 2020. DOI: 10.48550/arXiv.2004.10934. [76] CHEN K, WANG J, PANG J, et al. MMDetection: open mmlab detection toolbox and benchmark[M/OL]. arXiv, 2019. DOI: 10.48550/arXiv.1906.07155. [77] BAZAREVSKY V, KARTYNNIK Y, VAKUNOV A, et al. Blazeface: sub-millisecond neural face detection on mobile gpus[M/OL]. arXiv, 2019. DOI: 10.48550/arXiv.1907.05047. [78] YU S, WU W, FENG Y. Libfacedetection[J/OL]. GitHub repository, 2020. https://github.com/ShiqiYu/libfacedetection,https://github.com/ShiqiYu/libfacedetection.train. [79] ZHANG X. Openblas[J/OL]. GitHub repository, 2012. https://github.com/xianyi/OpenBLAS. [80] CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. Cudnn: efficient primitives for deep learning[M/OL]. arXiv, 2014. DOI: 10.48550/arXiv.1410.0759. [81] GOTO K, GEIJN R A V D. Anatomy of high-performance matrix multiplication[J/OL]. ACM Transactions on Mathematical Software, 2008, 34(3). DOI: 10.1145/1356052.1356053.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP391.41
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/544525
专题	工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	吴伟. 面向边缘设备的高精度毫秒级人脸检测技术研究[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12032501-吴伟-计算机科学与工程（7041KB）	--	--	限制开放	--	请求全文