南方科技大学知识苑(SUSTech KC): 基于深度学习的点云场景识别

题名	基于深度学习的点云场景识别
其他题名	POINT CLOUD PLACE RECOGNITION BASED ON DEEP LEARNING
姓名	汤智龙
姓名拼音	TANG Zhilong
学号	12032209
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	张宏
导师单位	电子与电气工程系
论文答辩日期	2023-05-16
论文提交日期	2023-06-28
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	场景识别在自动驾驶以及同步定位与地图构建（Simultaneous Localization and Mapping，SLAM）等领域有着重要的作用。给定一个查询场景的图像或点云，场景识别尝试在图像或点云数据库中找到与该查询最接近的匹配项，并判断是否为同一场景。场景识别与SLAM中的闭环检测本质上解决同一个问题，因此场景识别网络可以用于SLAM闭环检测工作。近年来，一些基于逐点的点云深度学习场景识别算法取得了成功。然而现有算法在面临存在旋转偏移及动态物体时，场景识别准确率不高。因此本课题就现有点云场景识别深度学习算法所存在的问题展开研究，主要内容如下：研究场景点云存在旋转偏移时的点云场景识别问题。本课题提出了POE-Net（Point Octree Encoding Network）。POE-Net利用双通道八叉树编码来提取点云局部描述符，并使用分组补偿注意力机制来增强点云局部描述符的邻域关系。POE-Net解决了场景点云存在旋转偏移时的场景识别问题，在Oxford数据集中的最相似点云平均召回率达到了91.1%。研究高动态环境下的点云场景识别问题。本课题提出MPOE-Net（Multi-scale Point Octree Encoding Network）。MPOE-Net使用了多个Transformer网络来增强点云局部邻域关系，并且多个NetVLAD模型进行多尺度信息之间的融合。通过融合不同尺度的点云特征信息，MPOE-Net解决了高动态场景的点云场景识别问题，在Oxford数据集中的最相似点云平均召回率达到了92.7%。本课题将MPOE-Net作为闭环检测模块与现有SLAM算法结合，解决大角度转向时现有SLAM算法闭环检测准确度不高的问题，提高SLAM算法的长时间数据关联能力。本课题设计了一个基于双通道八叉树编码以及多尺度信息的点云场景识别深度学习网络并在公共数据集上验证了该网络的有效性。本课题将MPOE-Net应用于LeGO-LOAM中，并将得到的SLAM算法在室外公共数据集以及室内自建数据集进行验证，证明了MPOE-Net作为闭环检测模块的有效性。
关键词	同步定位与地图构建深度学习点云场景识别多尺度信息闭环检测
语种	中文
培养类别	独立培养
入学年份	2020
学位授予年份	2023-06
参考文献列表	[1] 中华人民共和国国务院. 国务院关于印发《中国制造2025》的通知[EB/OL].2015. http://www.gov.cn/zhengce/content/2015-05/19/content_9784.htm. [2] 中华人民共和国公安部. 全国机动车保有量达4.17亿辆，驾驶人超过5亿人[EB/OL].2023. http://www.gov.cn/xinwen/2023-01/11/content_5736278.htm. [3]国家统计局. 中国统计年鉴[M]. 中国统计出版社, 2022. [4] INTERNATIONAL S. Taxonomy and Definitions for Terms Related to Driving Automation Systems for On-Road Motor Vehicle[S]. SAE J3016(TM), 2018. [5] CADENA C, CARLONE L, CARRILLO H, et al. Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age[J/OL]. IEEE Transactions on Robotics, 2016, 32(6): 1309-1332. https://doi.org/10.1109/TRO.2016.2624754. [6] LOWRY S M, SÜNDERHAUF N, NEWMAN P, et al. Visual Place Recognition: A Survey [J/OL]. IEEE Transactions on Robotics, 2016, 32(1): 1-19. https://doi.org/10.1109/TRO.2015.2496823. [7] CHEN Z, JACOBSON A, SÜNDERHAUF N, et al. Deep learning features at scale for visual place recognition[C/OL]//IEEE International Conference on Robotics and Automation, ICRA. IEEE, 2017: 3223-3230. https://doi.org/10.1109/ICRA.2017.7989366. [8] LOWE D G. Distinctive Image Features from Scale-Invariant Keypoints[J/OL]. International Journal of Computer Vision, 2004, 60(2): 91-110. https://doi.org/10.1023/B:VISI.0000029664.99615.94. [9] RUBLEE E, RABAUD V, KONOLIGE K, et al. ORB: An efficient alternative to SIFT or SURF [C/OL]//METAXAS D N, QUAN L, SANFELIU A, et al. IEEE International Conference on Computer Vision, ICCV. IEEE Computer Society, 2011: 2564-2571. https://doi.org/10.1109/ICCV.2011.6126544. [10] GÁLVEZ-LÓPEZ D, TARDÓS J D. Bags of Binary Words for Fast Place Recognition in Image Sequences[J/OL]. IEEE Transactions on Robotics, 2012, 28(5): 1188-1197. https://doi.org/10.1109/TRO.2012.2197158. [11] LIU Y, ZHANG H. Visual loop closure detection with a compact image descriptor[C/OL]// IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. IEEE, 2012:1051-1056. https://doi.org/10.1109/IROS.2012.6386145. [12] ARANDJELOVIC R, ZISSERMAN A. All About VLAD[C/OL]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 2013: 1578-1585. https://doi.org/10.1109/CVPR.2013.207. [13] PERRONNIN F, LIU Y, SÁNCHEZ J, et al. Large-scale image retrieval with compressed Fisher vectors[C/OL]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 2010: 3384-3391. https://doi.org/10.1109/CVPR.2010.5540009. [14] HOU Y, ZHANG H, ZHOU S. Convolutional neural network-based image representation for visual loop closure detection[C/OL]//IEEE International Conference on Information and Au-tomation, ICIA. IEEE, 2015: 2238-2245. https://doi.org/10.1109/ICInfA.2015.7279659. [15] ARANDJELOVIC R, GRONÁT P, TORII A, et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition[C/OL]//IEEE Conference on Computer Vision and Pat-tern Recognition, CVPR. IEEE Computer Society, 2016: 5297-5307. https://doi.org/10.1109/CVPR.2016.572. [16] SÜNDERHAUF N, SHIRAZI S, DAYOUB F, et al. On the performance of ConvNet features for place recognition[C/OL]//IEEE/RSJ International Conference on Intelligent Robots and Sys-tems, IROS. IEEE, 2015: 4297-4304. https://doi.org/10.1109/IROS.2015.7353986. [17] YE H, CHEN W, YU J, et al. Condition-Invariant and Compact Visual Place Description by Convolutional Autoencoder[J/OL]. CoRR, 2022, abs/2204.07350. https://doi.org/10.48550/a rXiv.2204.07350. [18] ZAFFAR M, KHALIQ A, EHSAN S, et al. Levelling the Playing Field: A Comprehensive Com-parison of Visual Place Recognition Approaches under Changing Conditions[J/OL]. CoRR, 2019, abs/1903.09107. http://arxiv.org/abs/1903.09107. [19] JOHNSON A E, HEBERT M. Using Spin Images for Eﬀicient Object Recognition in Cluttered 3D Scenes[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1999, 21 (5): 433-449. https://doi.org/10.1109/34.765655. [20] FROME A, HUBER D, KOLLURI R, et al. Recognizing Objects in Range Data Using Regional Point Descriptors[C/OL]//PAJDLA T, MATAS J. Lecture Notes in Computer Science: volume 3023 European Conference on Computer Vision, ECCV. Springer, 2004: 224-237. https: //doi.org/10.1007/978-3-540-24672-5_18. [21] RUSU R B, BLODOW N, MARTON Z C, et al. Aligning point cloud views using persistent fea-ture histograms[C/OL]//IEEE/RSJ International Conference on Intelligent Robots and Systems IROS. IEEE, 2008: 3384-3391. https://doi.org/10.1109/IROS.2008.4650967. [22] TOMBARI F, SALTI S, STEFANO L D. Unique Signatures of Histograms for Local Surface Description[C/OL]//DANIILIDIS K, MARAGOS P, PARAGIOS N. Lecture Notes in Computer Science: volume 6313 European Conference on Computer Vision ECCV. Springer, 2010:356- 369. https://doi.org/10.1007/978-3-642-15558-1_26. [23] YANG J, CAO Z, ZHANG Q. A fast and robust local descriptor for 3D point cloud registration [J/OL]. Information Science, 2016, 346-347: 163-179. https://doi.org/10.1016/j.ins.2016.01. 095. [24] SU H, MAJI S, KALOGERAKIS E, et al. Multi-view Convolutional Neural Networks for 3D Shape Recognition[C/OL]//IEEE International Conference on Computer Vision, ICCV. IEEE Computer Society, 2015: 945-953. https://doi.org/10.1109/ICCV.2015.114. [25] YANG Z, WANG L. Learning Relationships for Multi-View 3D Object Recognition[C/OL]//IEEE/CVF International Conference on Computer Vision, ICCV. IEEE, 2019: 7504-7513. https://doi.org/10.1109/ICCV.2019.00760. [26] WU Z, SONG S, KHOSLA A, et al. 3D ShapeNets: A deep representation for volumetric shapes[C/OL]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 2015: 1912-1920. https://doi.org/10.1109/CVPR.2015.7298801. [27] QI C R, SU H, NIESSNER M, et al. Volumetric and Multi-view CNNs for Object Classifica-tion on 3D Data[C/OL]//IEEE Conference on Computer Vision and Pattern Recognition,CVPR. IEEE Computer Society, 2016: 5648-5656. https://doi.org/10.1109/CVPR.2016.609. [28] RIEGLER G, ULUSOY A O, GEIGER A. OctNet: Learning Deep 3D Representations at High Resolutions[C/OL]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 2017: 6620-6629. https://doi.org/10.1109/CVPR.2017.701. [29] QI C R, SU H, MO K, et al. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation[C/OL]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR. IEEE Computer Society, 2017: 77-85. https://doi.org/10.1109/CVPR.2017.16. [30] QI C R, YI L, SU H, et al. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space[C/OL]//GUYON I, VON LUXBURG U, BENGIO S, et al. Annual Conference on Neural Information Processing Systems NIPS. 2017: 5099-5108. https://proceedings.neur ips.cc/paper/2017/hash/d8bf84be3800d12f74d8b05e9b89836f-Abstract.html. [31] ZHAO H, JIANG L, FU C, et al. PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing[C/OL]//IEEE Conference on Computer Vision and Pattern Recogni-tion, CVPR. Computer Vision Foundation / IEEE, 2019: 5565-5573. http://openaccess.thecvf. com/content_CVPR_2019/html/Zhao_PointWeb_Enhancing_Local_Neighborhood_Features _for_Point_Cloud_Processing_CVPR_2019_paper.html. DOI: 10.1109/CVPR.2019.00571. [32] DENG H, BIRDAL T, ILIC S. PPF-FoldNet: Unsupervised Learning of Rotation Invariant 3D Local Descriptors[C/OL]//FERRARI V, HEBERT M, SMINCHISESCU C, et al. Lecture Notes in Computer Science: volume 11209 European Conference on Computer Vision ECCV. Springer, 2018: 620-638. https://doi.org/10.1007/978-3-030-01228-1_37. [33] UY M A, LEE G H. PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition[C/OL]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR. Computer Vision Foundation / IEEE Computer Society, 2018: 4470-4479. http://openaccess.the cvf.com/content_cvpr_2018/html/Uy_PointNetVLAD_Deep_Point_CVPR_2018_paper.html. DOI: 10.1109/CVPR.2018.00470. [34] HE L, WANG X, ZHANG H. M2DP: A novel 3D point cloud descriptor and its application in loop closure detection[C/OL]//IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. IEEE, 2016: 231-237. https://doi.org/10.1109/IROS.2016.7759060. [35] PERDOMO L, PITTOL D, MANTELLI M, et al. c-M2DP: A Fast Point Cloud Descriptor with Color Information to Perform Loop Closure Detection[C/OL]//IEEE International Conference on Automation Science and Engineering, CASE. IEEE, 2019: 1145-1150. https://doi.org/10.1 109/COASE.2019.8842896. [36] CAO F, ZHUANG Y, ZHANG H, et al. Robust place recognition and loop closing in laser-based SLAM for UGVs in urban environments[J]. IEEE Sensors Journal, 2018, 18(10): 4242-4252. [37] COP K P, BORGES P V K, DUBÉ R. Delight: An Eﬀicient Descriptor for Global Localisation Using LiDAR Intensities[C/OL]//IEEE International Conference on Robotics and Automation, ICRA. IEEE, 2018: 3653-3660. https://doi.org/10.1109/ICRA.2018.8460940. [38] KIM G, KIM A. Scan Context: Egocentric Spatial Descriptor for Place Recognition Within 3D Point Cloud Map[C/OL]//IEEE/RSJ International Conference on Intelligent Robots and Sys-tems, IROS. IEEE, 2018: 4802-4809. https://doi.org/10.1109/IROS.2018.8593953. [39] WANG H, WANG C, XIE L. Intensity Scan Context: Coding Intensity and Geometry Rela-tions for Loop Closure Detection[C/OL]//IEEE International Conference on Robotics and Au-tomation, ICRA 2020, Paris, France, May 31 - August 31, 2020. IEEE, 2020: 2095-2101. https://doi.org/10.1109/ICRA40945.2020.9196764. [40] WANG Y, SUN Z, XU C, et al. LiDAR Iris for Loop-Closure Detection[C/OL]//IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. IEEE, 2020: 5769-5775. https://doi.org/10.1109/IROS45743.2020.9341010. [41] CHANG M Y, YEON S, RYU S, et al. SpoxelNet: Spherical Voxel-based Deep Place Recogni-tion for 3D Point Clouds of Crowded Indoor Spaces[C/OL]//IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. IEEE, 2020: 8564-8570. https://doi.org/10.1109/IR OS45743.2020.9341549. [42] KOMOROWSKI J. MinkLoc3D: Point Cloud Based Large-Scale Place Recognition[C/OL]//IEEE Winter Conference on Applications of Computer Vision, WACV. IEEE, 2021: 1789-1798. https://doi.org/10.1109/WACV48630.2021.00183. [43] YIN P, XU L, ZHANG J, et al. FusionVLAD: A Multi-View Deep Fusion Networks for Viewpoint-Free 3D Place Recognition[J/OL]. IEEE Robotics and Automation Letters, 2021, 6(2): 2304-2310. https://doi.org/10.1109/LRA.2021.3061375. [44] MA J, ZHANG J, XU J, et al. OverlapTransformer: An Eﬀicient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition[J/OL]. IEEE Robotics and Automa-tion Letters, 2022, 7(3): 6958-6965. https://doi.org/10.1109/LRA.2022.3178797. [45] ZHANG W, XIAO C. PCAN: 3D Attention Map Learning Using Contextual Information for Point Cloud Based Retrieval[C/OL]//IEEE Conference on Computer Vision and Pattern Recog-nition, CVPR. Computer Vision Foundation / IEEE, 2019: 12436-12445. http://openaccess.the cvf.com/content_CVPR_2019/html/Zhang_PCAN_3D_Attention_Map_Learning_Using_Co ntextual_Information_for_Point_CVPR_2019_paper.html. DOI: 10.1109/CVPR.2019.01272. [46] LIU Z, ZHOU S, SUO C, et al. LPD-Net: 3D Point Cloud Learning for Large-Scale Place Recognition and Environment Analysis[C/OL]//IEEE/CVF International Conference on Com-puter Vision, ICCV. IEEE, 2019: 2831-2840. https://doi.org/10.1109/ICCV.2019.00292. [47] XIA Y, XU Y, LI S, et al. SOE-Net: A Self-Attention and Orientation Encoding Network for Point Cloud Based Place Recognition[C/OL]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR. Computer Vision Foundation / IEEE, 2021: 11348-11357. https://openaccess.thecvf.com/content/CVPR2021/html/Xia_SOE-Net_A_Self-Attention_and_Orien tation_Encoding_Network_for_Point_Cloud_CVPR_2021_paper.html. DOI: 10.1109/CVPR 46437.2021.01119. [48] HUI L, YANG H, CHENG M, et al. Pyramid Point Cloud Transformer for Large-Scale Place Recognition[C/OL]//IEEE/CVF International Conference on Computer Vision, ICCV. IEEE, 2021: 6078-6087. https://doi.org/10.1109/ICCV48922.2021.00604. [49] ORT T, PAULL L, RUS D. Autonomous Vehicle Navigation in Rural Environments With-out Detailed Prior Maps[C/OL]//IEEE International Conference on Robotics and Automation, ICRA. IEEE, 2018: 2040-2047. https://doi.org/10.1109/ICRA.2018.8460519. [50] WANG Y, SUN Y, LIU Z, et al. Dynamic Graph CNN for Learning on Point Clouds[J/OL]. ACM Transactions On Graphics, 2019, 38(5): 146:1-146:12. https://doi.org/10.1145/3326362. [51] SHEN Y, FENG C, YANG Y, et al. Mining Point Cloud Local Structures by Kernel Corre-lation and Graph Pooling[C/OL]//IEEE Conference on Computer Vision and Pattern Recog-nition, CVPR. Computer Vision Foundation / IEEE Computer Society, 2018: 4548-4557. http://openaccess.thecvf.com/content_cvpr_2018/html/Shen_Mining_Point_Cloud_CVPR_20 18_paper.html. DOI: 10.1109/CVPR.2018.00478. [52] QI C R, LITANY O, HE K, et al. Deep Hough Voting for 3D Object Detection in Point Clouds [C/OL]//IEEE/CVF International Conference on Computer Vision, ICCV. IEEE, 2019: 9276-9285. https://doi.org/10.1109/ICCV.2019.00937. [53] THOMAS H, QI C R, DESCHAUD J, et al. KPConv: Flexible and Deformable Convolution for Point Clouds[C/OL]//IEEE/CVF International Conference on Computer Vision, ICCV. IEEE, 2019: 6410-6419. https://doi.org/10.1109/ICCV.2019.00651. [54] JIANG M, WU Y, LU C. PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation[J/OL]. CoRR, 2018, abs/1807.00652. http://arxiv.org/abs/1807.00652. [55] MATURANA D, SCHERER S A. VoxNet: A 3D Convolutional Neural Network for real-time object recognition[C/OL]//IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. IEEE, 2015: 922-928. https://doi.org/10.1109/IROS.2015.7353481. [56] BRUNA J, ZAREMBA W, SZLAM A, et al. Spectral Networks and Locally Connected Net-works on Graphs[C/OL]//BENGIO Y, LECUN Y. International Conference on Learning Rep-resentations, ICLR. 2014. http://arxiv.org/abs/1312.6203. [57] GUO M, CAI J, LIU Z, et al. PCT: Point cloud transformer[J/OL]. Computational Visual Media, 2021, 7(2): 187-199. https://doi.org/10.1007/s41095-021-0229-5. [58] PHILBIN J, CHUM O, ISARD M, et al. Object retrieval with large vocabularies and fast spatial matching[C/OL]//IEEE Computer Society Conference on Computer Vision and Pattern Recog-nition CVPR. IEEE Computer Society, 2007. https://doi.org/10.1109/CVPR.2007.383172. [59] MADDERN W, PASCOE G, LINEGAR C, et al. 1 year, 1000 km: The Oxford RobotCar dataset [J/OL]. International Journal of Robotics Research, 2017, 36(1): 3-15. https://doi.org/10.1177/ 0278364916679498. [60] HUANG R, HONG D, XU Y, et al. Multi-Scale Local Context Embedding for LiDAR Point Cloud Classification[J/OL]. IEEE Geoscience and Remote Sensing Letter, 2020, 17(4): 721-725. https://doi.org/10.1109/LGRS.2019.2927779. [61] WANG L, HUANG Y, SHAN J, et al. MSNet: Multi-Scale Convolutional Network for Point Cloud Classification[J/OL]. Remote Sensing, 2018, 10(4): 612. https://doi.org/10.3390/rs1004 0612. [62] WANG R, SHEN Y, ZUO W, et al. TransVPR: Transformer-Based Place Recognition with Multi-Level Attention Aggregation[C/OL]//IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR. IEEE, 2022: 13638-13647. https://doi.org/10.1109/CVPR52688. 2022.01328. [63] SHAN T, ENGLOT B J. LeGO-LOAM: Lightweight and Ground-Optimized Lidar Odome-try and Mapping on Variable Terrain[C/OL]//IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS. IEEE, 2018: 4758-4765. https://doi.org/10.1109/IROS.2018.8594 299. [64] LEVENBERG K. A method for the solution of certain non-linear problems in least squares[J]. Quarterly of Applied Mathematics, 1944, 2(2): 164-168. [65] CHETVERIKOV D, SVIRKO D, STEPANOV D, et al. The Trimmed Iterative Closest Point Algorithm[C/OL]//International Conference on Pattern Recognition, ICPR. IEEE Computer Society, 2002: 545-548. https://doi.org/10.1109/ICPR.2002.1047997. [66] GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The KITTI dataset[J/OL]. International Journal of Robotics Research, 2013, 32(11): 1231-1237. https://doi.org/10.1177/ 0278364913491297. [67] QUIGLEY M, CONLEY K, GERKEY B, et al. ROS: an open-source Robot Operating System [C]//ICRA workshop on open source software: volume 3. Kobe, Japan, 2009: 5.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP242.6
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/544399
专题	工学院_电子与电气工程系
推荐引用方式 GB/T 7714	汤智龙. 基于深度学习的点云场景识别[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12032209-汤智龙-电子与电气工程（20053KB）	--	--	限制开放	--	请求全文