中文版 | English
题名

面向边缘设备的高精度毫秒级人脸检测技术研究

其他题名
Research on High-Precision Millisecond-Level Face Detection Technology for Edge Devices
姓名
姓名拼音
WU Wei
学号
12032501
学位类型
硕士
学位专业
0809 电子科学与技术
学科门类/专业学位类别
08 工学
导师
于仕琪
导师单位
计算机科学与工程系
论文答辩日期
2023-05-13
论文提交日期
2023-06-29
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

随着物联网技术的不断发展和普及,人脸检测作为一种重要的计算机视觉任务,在物联网应用中发挥着越来越重要的作用,是实现人与边缘设备智能交互的关键技术之一。然而,边缘设备上的人脸检测面临着诸多挑战,不仅需要适应低功耗、低性能、低存储等硬件条件的限制,还需要在保证高精度、高速度、高鲁棒性等性能指标的同时,适应各种复杂多变的场景和需求。为了解决这一难题,本文提出了一种超轻量级人脸检测算法YuNet。本文提出轻量化模型设计原则,手动设计了主干网络、颈部网络和检测头,极大地降低了模型参数量和计算量,在保证模型效率和高准确率的同时,也提高了模型的简洁性和可控性。YuNet还创新地采用了无锚框机制、自适应标签匹配策略和样本均衡的数据增强策略三个技术点,使得模型在毫秒级的检测速度下仍能保持高精度,并有效地解决了小目标检测、目标分布不平衡等问题。YuNet不仅全面优于传统的基于手工特征的人脸检测算法,而且在同精度水平下与其他基于深度学习的轻量级人脸检测器相比,在参数量比较上具有数量级上的优势,并且在速度方面也有明显优势。据已知知识范围内,本文提出的YuNet几乎是参数量最少却仍保持高精度且速度最快的人脸检测器之一。本文还基于YuNet开发了libfacedetection开源项目,使用C++实现且不依赖任何第三方库,在多个指令集平台(AVX2/AVX512/NEON)上进行了代码层面的SIMD指令优化加速,并实现了快速部署。实验结果表明,YuNet在边缘设备上具有显著优势,并能够支持多种应用场景。

其他摘要

With the development of Internet of Things technology, face detection plays an increasingly important role in IoT applications and is a key technology for human-edge device interaction. However, face detection on edge devices faces hardware constraints such as low power consumption, low performance, and low storage, while also requiring performance indicators such as high accuracy, high speed, and high robustness. To solve this problem, this paper proposes YuNet, a super-lightweight face detection algorithm. Based on the lightweight model design principles proposed in this paper, we manually design YuNet’s backbone network, neck network and detection head to greatly reduce model parameter size and computation. For models with extremely small parameter size and computation, we innovatively propose anchor-free mechanism and adaptive label matching strategy as well as sample-balanced data augmentation strategy to achieve high accuracy while maintaining millisecond-level detection speed. YuNet not only outperforms traditional face detection algorithms based on handcrafted features but also has an order of magnitude advantage in parameter size over other lightweight face detection networks with the same accuracy based on deep learning. To our knowledge, our YuNet is almost the smallest parameter size yet still maintains high accuracy face detector. In this paper we also develop libfacedetection open source project based on YuNet which is implemented in C++ without relying on any third-party libraries performs code-level SIMD instruction optimization acceleration on multiple instruction set platforms and achieves fast deployment. Experimental results show that YuNet has significant advantages on edge devices and can support various application scenarios.

关键词
其他关键词
语种
中文
培养类别
独立培养
入学年份
2020
学位授予年份
2023-06
参考文献列表

[1] YANG M H, KRIEGMAN D, AHUJA N. Detecting faces in images: a survey[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(1): 34-58. DOI: 10.1109/34.982883.
[2] YANG G, HUANG T S. Human face detection in a complex background[J/OL]. Pattern Recog￾nition, 1994, 27(1): 53-63. DOI: 10.1016/0031-3203(94)90017-5.
[3] LEUNG T, BURL M, PERONA P. Finding faces in cluttered scenes using random labeled graph matching[C/OL]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 1995: 637-644. DOI: 10.1109/iccv.1995.466878.
[4] YOW K C, CIPOLLA R. Feature-based human face detection[J/OL]. Image and Vision Com￾puting, 1997, 15(9): 713-735. DOI: 10.1016/S0262-8856(97)00003-6.
[5] MCKENNA S J, GONG S, RAJA Y. Modelling facial colour and identity with gaussian mixtures[J/OL]. Pattern Recognition, 1998, 31(12): 1883-1892. DOI: 10.1016/S0031-3203(98)00066-1.
[6] KJELDSEN R, KENDER J. Finding skin in color images[C/OL]//Proceedings of the Second International Conference on Automatic Face and Gesture Recognition. 1996: 312-317. DOI:10.1109/AFGR.1996.557283.
[7] CRAW I, TOCK D, BENNETT A. Finding face features[C/OL]//Computer Vision —ECCV’92. Springer Berlin Heidelberg, 1992: 92-96. DOI: 10.1007/3-540-55426-2_12.
[8] LANITIS A, TAYLOR C, COOTES T. Automatic face identification system using flexibleappearance models[J/OL]. Image and Vision Computing, 1995, 13(5): 393-401. DOI: 10.1016/0262-8856(95)99726-H.
[9] TURK M, PENTLAND A. Eigenfaces for recognition[J/OL]. Journal of Cognitive Neuro￾science, 1991, 3(1): 71-86. DOI: 10.1162/jocn.1991.3.1.71.
[10] SUNG K K, POGGIO T. Example-based learning for view-based human face detection[J/OL]. IEEE Transactions on pattern analysis and machine intelligence, 1998, 20(1): 39-51. DOI:10.1109/34.655648.
[11] ROWLEY H, BALUJA S, KANADE T. Neural network-based face detection[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(1): 23-38. DOI: 10.1109/34.655647.
[12] OSUNA E, FREUND R, GIROSIT F. Training support vector machines: an application to face detection[C/OL]//Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 1997: 130-136. DOI: 10.1109/cvpr.1997.609310.
[13] VIOLA P A, JONES M J. Rapid object detection using a boosted cascade of simple features[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni￾tion. IEEE Computer Society, 2001: 511-518. DOI: 10.1109/cvpr.2001.990517.
[14] CROW F C. Summed-area tables for texture mapping[J/OL]. SIGGRAPH Comput. Graph.,1984, 18(3): 207–212. DOI: 10.1145/964965.808600.
[15] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J/OL]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. DOI: 10.1006/jcss.1997.1504.
[16] FELZENSZWALB P F, GIRSHICK R B, MCALLESTER D, et al. Object detection with dis￾criminatively trained part-based models[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 32(9): 1627-1645. DOI: 10.1109/TPAMI.2009.167.
[17] FELZENSZWALB P F, HUTTENLOCHER D P. Pictorial structures for object recognition[J/OL]. International journal of computer vision, 2005, 61: 55-79. DOI: 10.1023/B:VISI.0000042934.15159.49.
[18] FENG Y, YU S, PENG H, et al. Detect faces efficiently: a survey and evaluations[J/OL]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2022, 4(1): 1-18. DOI: 10.1109/TBIOM.2021.3120412.
[19] REN S, HE K, GIRSHICK R B, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelli￾gence, 2017, 39(6): 1137-1149. DOI: 10.1109/TPAMI.2016.2577031.
[20] LI H, LIN Z, SHEN X, et al. A convolutional neural network cascade for face detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2015: 5325-5334. DOI: 10.1109/cvpr.2015.7299170.
[21] ZHANG K, ZHANG Z, LI Z, et al. Joint face detection and alignment using multitask cascaded convolutional networks[J/OL]. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503. DOI:10.1109/LSP.2016.2603342.
[22] WANG H, LI Z, JI X, et al. Face r-cnn[M/OL]. arXiv, 2017. DOI: 10.48550/arXiv.1706.01061.
[23] HU P, RAMANAN D. Finding tiny faces[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2017: 951-959. DOI: 10.1109/cvpr.2017.166.
[24] NAJIBI M, SAMANGOUEI P, CHELLAPPA R, et al. Ssh: single stage headless face detector[C/OL]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2017:4875-4884. DOI: 10.1109/iccv.2017.522.
[25] ZHANG S, ZHU X, LEI Z, et al. S3fd: single shot scale-invariant face detector[C/OL]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 192-201. DOI:10.1109/iccv.2017.30.
[26] TANG X, DU D K, HE Z, et al. Pyramidbox: a context-assisted single shot face detector[C/OL]//Proceedings of the European Conference on Computer Vision. 2018: 797-813. DOI:10.1007/978-3-030-01240-3_49.
[27] LI J, WANG Y, WANG C, et al. Dsfd: dual shot face detector[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 5060-5069. DOI:10.1109/cvpr.2019.00520.
[28] DENG J, GUO J, VERVERAS E, et al. Retinaface: single-shot multi-level face localisation in the wild[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. DOI: 10.1109/cvpr42600.2020.00525.
[29] LIU Y, TANG X, HAN J, et al. Hambox: delving into mining high-quality anchors on face detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2020: 13043-13051. DOI: 10.1109/cvpr42600.2020.01306.
[30] LIU Y, TANG X. Bfbox: searching face-appropriate backbone and feature pyramid network for face detector[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 13568-13577. DOI: 10.1109/cvpr42600.2020.01358.
[31] LI J, ZHANG B, WANG Y, et al. Asfd: automatic and scalable face detector[C/OL]//Proceedings of the ACM International Conference on Multimedia. Association for Comput￾ing Machinery, 2021: 2139–2147. DOI: 10.1145/3474085.3475372.
[32] ZHANG S, ZHU X, LEI Z, et al. Faceboxes: a cpu real-time face detector with high accuracy[C/OL]//2017 IEEE International Joint Conference on Biometrics (IJCB). 2017: 1-9. DOI:10.1109/BTAS.2017.8272675.
[33] HE Y, XU D, WU L, et al. LFFD: a light and fast face detector for edge devices[M/OL]. arXiv,2019. DOI: 10.48550/arXiv.1904.10633.
[34] QI D, TAN W, YAO Q, et al. Yolo5face: why reinventing a face detector[C]//Proceedings of the European Conference on Computer Vision, Workshops. Springer Nature Switzerland, 2023:228-244.
[35] JOCHER G. Yolov5[J/OL]. GitHub repository, 2020. https://github.com/ultralytics/yolov5.
[36] GUO J, DENG J, LATTAS A, et al. Sample and computation redistribution for efficient face detection[C]//International Conference on Learning Representations. OpenReview.net, 2022.
[37] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recog￾nition[J/OL]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. DOI: 10.1109/5.726791.
[38] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolu￾tional neural networks[C]//Advances in Neural Information Processing Systems. 2012: 1106-1114.
[39] DENG J, DONG W, SOCHER R, et al. Imagenet: a large-scale hierarchical image database[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni￾tion. 2009: 248-255. DOI: 10.1109/cvpr.2009.5206848.
[40] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[M/OL]. arXiv, 2014. DOI: 10.48550/arXiv.1409.1556.
[41] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2016.DOI: 10.1109/cvpr.2016.90.
[42] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: efficient convolutional neural networks for mobile vision applications[M/OL]. arXiv, 2017. DOI: 10.48550/arXiv.1704.04861.
[43] SANDLER M, HOWARD A, ZHU M, et al. Mobilenetv2: inverted residuals and linear bot￾tlenecks[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 4510-4520. DOI: 10.1109/cvpr.2018.00474.
[44] HOWARD A, SANDLER M, CHU G, et al. Searching for mobilenetv3[C/OL]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 1314-1324. DOI:10.1109/iccv.2019.00140.
[45] CHOLLET F. Xception: deep learning with depthwise separable convolutions[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2017:1251-1258. DOI: 10.1109/cvpr.2017.195.
[46] XU B, WANG N, CHEN T, et al. Empirical evaluation of rectified activations in convolutional network[M/OL]. arXiv, 2015. DOI: 10.48550/arXiv.1505.00853.
[47] JARRETT K, KAVUKCUOGLU K, RANZATO M, et al. What is the best multi-stage archi￾tecture for object recognition?[C/OL]//2009 IEEE 12th International Conference on Computer Vision. 2009: 2146-2153. DOI: 10.1109/iccv.2009.5459469.
[48] NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines[C]//Proceedings of the International Conference on International Conference on Machine Learning. Omnipress, 2010: 807–814.
[49] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the International Conference on Machine Learning: volume 37. JMLR.org, 2015: 448-456.
[50] GIRSHICK R B. Fast R-CNN[C/OL]//Proceedings of the IEEE International Conference on Computer Vision. IEEE Computer Society, 2015: 1440-1448. DOI: 10.1109/iccv.2015.169.
[51] YU J, JIANG Y, WANG Z, et al. Unitbox: an advanced object detection network[C/OL]//Proceedings of the ACM International Conference on Multimedia. Association for Computing Machinery, 2016: 516–520. DOI: 10.1145/2964284.2967274.
[52] PENG H, YU S. A systematic iou-related method: beyond simplified regression for better localization[J/OL]. IEEE Transactions on Image Processing, 2021, 30: 5032-5044. DOI: 10.1109/TIP.2021.3077144.
[53] JAIN V, LEARNED-MILLER E. Fddb: a benchmark for face detection in unconstrained set￾tings: UM-CS-2010-009[R]. University of Massachusetts, Amherst, 2010.
[54] YANG S, LUO P, LOY C C, et al. WIDER FACE: A face detection benchmark[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016: 5525-5533. DOI: 10.1109/cvpr.2016.596.
[55] LIU Y, WANG F, DENG J, et al. Mogface: towards a deeper appreciation on face detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni￾tion. 2022: 4093-4102. DOI: 10.1109/cvpr52688.2022.00406.
[56] ZHU Y, CAI H, ZHANG S, et al. Tinaface: Strong but simple baseline for face detection[M/OL]. arXiv, 2020. DOI: 10.48550/arXiv.2011.13183.
[57] ZHU C, ZHENG Y, LUU K, et al. Cms-rcnn: contextual multi-scale region-based cnn for unconstrained face detection[J/OL]. Deep learning for biometrics, 2017: 57-79. DOI: 10.1007/978-3-319-61657-5_3.
[58] WANG Y, JI X, ZHOU Z, et al. Detecting faces using region-based fully convolutional networks[M/OL]. arXiv, 2017. DOI: 10.48550/arXiv.1709.05256.
[59] LIN T Y, DOLLAR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni￾tion. IEEE Computer Society, 2017. DOI: 10.1109/cvpr.2017.106.
[60] LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:8759-8768. DOI: 10.1109/cvpr.2018.00913.
[61] GHIASI G, LIN T Y, LE Q V. Nas-fpn: learning scalable feature pyramid architecture for object detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 7036-7045. DOI: 10.1109/cvpr.2019.00720.
[62] GE Z, LIU S, WANG F, et al. Yolox: exceeding yolo series in 2021[M/OL]. arXiv, 2021. DOI:10.48550/arXiv.2107.08430.
[63] TIAN Z, SHEN C, CHEN H, et al. FCOS: fully convolutional one-stage object detection[C/OL]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019:9627-9636. DOI: 10.1109/iccv.2019.00972.
[64] ZHANG S, CHI C, YAO Y, et al. Bridging the gap between anchor-based and anchor-Free detection via adaptive training sample selection[C/OL]//Proceedings of the IEEE/CVF Confer￾ence on Computer Vision and Pattern Recognition. Computer Vision Foundation / IEEE, 2020:9756-9765. DOI: 10.1109/cvpr42600.2020.00978.
[65] MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: practical guidelines for efficient cnn archi￾tecture design[C/OL]//Proceedings of the European Conference on Computer Vision. Springer, 2018. DOI: 10.1007/978-3-030-01264-9_8.
[66] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. Squeezenet: alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size[M/OL]. arXiv, 2016. DOI: 10.48550/arXiv.1602.07360.
[67] REDMON J, FARHADI A. Yolov3: an incremental improvement[M/OL]. arXiv, 2018. DOI:10.48550/arXiv.1804.02767.
[68] LIU W, ANGUELOV D, ERHAN D, et al. SSD: single shot multibox detector[C/OL]//Proceedings of the European Conference on Computer Vision. Springer, 2016: 21-37. DOI:10.1007/978-3-319-46448-0_2.
[69] ZHOU X, WANG D, KRäHENBüHL P. Objects as points[M/OL]. arXiv, 2019. DOI: 10.48550/arXiv.1904.07850.
[70] LAW H, DENG J. Cornernet: detecting objects as paired keypoints[C/OL]//Proceedings of the European Conference on Computer Vision. 2018. DOI: 10.1007/978-3-030-01264-9_45.
[71] SHRIVASTAVA A, GUPTA A, GIRSHICK R. Training region-based object detectors with online hard example mining[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 2016. DOI: 10.1109/cvpr.2016.89.
[72] GE Z, LIU S, LI Z, et al. Ota: optimal transport assignment for object detection[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Com￾puter Vision Foundation / IEEE, 2021: 303-312. DOI: 10.1109/cvpr46437.2021.00037.
[73] ZHANG X, WAN F, LIU C, et al. Freeanchor: learning to match anchors for visual object detection[C]//Advances in Neural Information Processing Systems. 2019: 147-155.
[74] CUBUK E D, ZOPH B, MANE D, et al. Autoaugment: learning augmentation strategies from data[C/OL]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog￾nition. 2019. DOI: 10.1109/CVPR.2019.00020.
[75] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: optimal speed and accuracy of object detection[M/OL]. arXiv, 2020. DOI: 10.48550/arXiv.2004.10934.
[76] CHEN K, WANG J, PANG J, et al. MMDetection: open mmlab detection toolbox and benchmark[M/OL]. arXiv, 2019. DOI: 10.48550/arXiv.1906.07155.
[77] BAZAREVSKY V, KARTYNNIK Y, VAKUNOV A, et al. Blazeface: sub-millisecond neural face detection on mobile gpus[M/OL]. arXiv, 2019. DOI: 10.48550/arXiv.1907.05047.
[78] YU S, WU W, FENG Y. Libfacedetection[J/OL]. GitHub repository, 2020. https://github.com/ShiqiYu/libfacedetection,https://github.com/ShiqiYu/libfacedetection.train.
[79] ZHANG X. Openblas[J/OL]. GitHub repository, 2012. https://github.com/xianyi/OpenBLAS.
[80] CHETLUR S, WOOLLEY C, VANDERMERSCH P, et al. Cudnn: efficient primitives for deep learning[M/OL]. arXiv, 2014. DOI: 10.48550/arXiv.1410.0759.
[81] GOTO K, GEIJN R A V D. Anatomy of high-performance matrix multiplication[J/OL]. ACM Transactions on Mathematical Software, 2008, 34(3). DOI: 10.1145/1356052.1356053.

所在学位评定分委会
电子科学与技术
国内图书分类号
TP391.41
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/544525
专题工学院_计算机科学与工程系
推荐引用方式
GB/T 7714
吴伟. 面向边缘设备的高精度毫秒级人脸检测技术研究[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12032501-吴伟-计算机科学与工程(7041KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[吴伟]的文章
百度学术
百度学术中相似的文章
[吴伟]的文章
必应学术
必应学术中相似的文章
[吴伟]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。