中文版 | English
题名

面向自动驾驶应用的域适应多目标检测研究

姓名
姓名拼音
DING Guangyao
学号
11930573
学位类型
硕士
学位专业
0809 电子科学与技术
学科门类/专业学位类别
08 工学
导师
郝祁
导师单位
计算机科学与工程系
论文答辩日期
2022-06-08
论文提交日期
2022-06-15
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

由于各种各样的域偏差的存在,当把检测模型从源域迁移至目标域时,模型的检测性能往往会大幅下降。我们分析了真实到真实,仿真到真实跨域时存在的问题,并针对这些问题分别提出一种算法用以提高真实/仿真到真实任务中模型跨域检测性能。

我们提出了一种联合自监督训练框架(JointSelf-TrainingJST)来提高真实到真实跨域中2D3D模型检测的性能。JST框架包含三个创新点来克服目标尺寸偏差,不稳定的自监督训练等问题:(1)提出了一种不需要对点云进行任何修改的锚点缩放算法,用来消除目标尺度偏差问题。(2)提出了一种2D3D包围框对齐的算法,用来为自监督训练生成高质量的伪标签。(3)提出了一种模型平滑策略来减少自监督期训练时的震荡问题。实验结果表明,JST算法能够同时提高2D3D检测器在目标域的性能。尤其是3D检测,我们在多个跨域任务上都达到了前沿水平。

仿真数据集存在多样性低,场景简单,目标尺度分布为多峰等问题。针对这些问题,我们提出了一种由繁入简(ComplexToSimpleCTS)的自监督训练算法来提升仿真到真实跨域中3D模型检测的性能。CTS算法也包含三个创新点来解决上述问题:(1)提出了一种锚点框二阶段预测头,使得二阶段任务复杂度降低,同时也使得我们能够对二阶段输入应用多种数据增强。(2)提出了一种应用于二阶段输入上的数据增强方法,提高了仿真场景的多样性和复杂度,同时也将仿真场景目标尺度的多峰分布变为了单峰分布(与真实数据一致)。(3)提出了一种基于平均教师的一/二阶段自监督训练算法,通过额外的数据增强和教师模型来迫使学生模型能够在目标域上也学习到复杂场景。实验结果表明,CTS算法能够极大的提高仿真到真实场景下模型的跨域能力。特别地,CTS算法在仿真到KITTI任务上,𝐴𝑃3𝐷@0.7比直接跨域方法高49.3%

其他摘要

Object detection always suffers from a dramatic performance drop when transferring the model trained in the source domain to the target domain due to various domain shifts. We analyze the problems when transferring a detector from real-to-real and simulation to-real task, and propose algorithms to improve the real/simulation-to-real performance
for these problems.

We propose a Joint Self-Training (JST) framework to improve 2D image and 3D
point cloud detectors with aligned outputs simultaneously during the transferring. The novelty of the proposed framework is threefold to overcome object size biases and unstable self-training processes: (1) developing an anchor scaling scheme to effciently eliminate the object size biases without any modification on point clouds;(2) developing a 2D&3D bounding box alignment method to generate high-quality pseudo labels for the self-training process;(3) developing a model smoothing based training strategy to reduce the training oscillation properly. Experiment results show that the proposed approach improves the performance of 2D and 3D detectors in the target domain simultaneously; especially the superior accuracy of 3D detection can be achieved on benchmark datasets over the state-of-the-art methods.

The simulation dataset has many challenges such as low diversity, simple scenes, and multimodal distributions of object size. We propose a complex to simple (CTS) algorithm based on self-training to improve the 3D detection performance of the simulation-to-real task. The novelty of the proposed CTS algorithm is also threefold to solve the above challenges: (1) developing an anchor based second stage prediction head to reduce the complexity of the second stage task, which also enables various data augmentations to the second stage input; (2) developing a data augmentation method, which improves the diversity and complexity of the simulation dataset, and also converts the multimodal object size distribution of the target domain into a unimodal distribution (same as real data); (3) developing a first/second stage self-training algorithm based on the mean teacher method, which forces the student model to learn complex scenes on the target domain through additional data augmentation and the teacher model. Experimental results show that the CTS algorithm can greatly improve the performance of a detector on the simulation-to real task. In particular, the CTS algorithm achieves 49.3% higher 𝐴𝑃3𝐷@0.7 than the Source Only method on the simulation-to-KITTI task.

关键词
语种
中文
培养类别
独立培养
入学年份
2019
学位授予年份
2022-06
参考文献列表

[1] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012: 3354-3361.
[2] GEIGER A, LENZ P, STILLER C, et al. Vision meets robotics: The kitti dataset[J]. The International Journal of Robotics Research, 2013, 32(11): 1231-1237.
[3] HOUSTON J, ZUIDHOF G, BERGAMINI L, et al. One thousand and one hours: Self-driving motion prediction dataset[J]. arXiv preprint arXiv:2006.14480, 2020.
[4] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778.
[5] HU J, SHEN L, SUN G. Squeeze-and-excitation networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7132-7141.
[6] ZHANG H, WU C, ZHANG Z, et al. Resnest: Split-attention networks[J]. arXiv preprint arXiv:2004.08955, 2020.
[7] GE Z, LIU S, WANG F, et al. Yolox: Exceeding yolo series in 2021[J]. arXiv preprint arXiv:2107.08430, 2021.
[8] ZHU X, SU W, LU L, et al. Deformable detr: Deformable transformers for end-to-end object detection[J]. arXiv preprint arXiv:2010.04159, 2020.
[9] SHI S, WANG X, LI H. Pointrcnn: 3d object proposal generation and detection from point cloud[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2019.
[10] RONNEBERGER O, FISCHER P, BROX T. U-net: Convolutional networks for biomedical image segmentation[C]//International Conference on Medical image computing and computerassisted intervention. Springer, 2015: 234-241.
[11] LONG J, SHELHAMER E, DARRELL T. Fully convolutional networks for semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015: 3431-3440.
[12] YE M, SHEN J, LIN G, et al. Deep learning for person re-identification: A survey and outlook[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021.
[13] LI E, WANG S, LI C, et al. Sustech points: A portable 3d point cloud interactive annotation platform system[C]//2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2020: 1108-1115.
[14] WANG Y, CHEN X, YOU Y, et al. Train in germany, test in the usa: Making 3d object detectors generalize[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020: 11713-11723.
[15] BEN-DAVID S, BLITZER J, CRAMMER K, et al. A theory of learning from different domains [J]. Machine Learning, 2010, 79(1): 151-175.
[16] GANIN Y, USTINOVA E, AJAKAN H, et al. Domain-adversarial training of neural networks [J]. The Journal of Machine Learning Research, 2016, 17(1): 2096-2030.
[17] CHEN Y, LI W, SAKARIDIS C, et al. Domain adaptive faster r-cnn for object detection in the wild[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 3339-3348.
[18] REN S, HE K, GIRSHICK R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in Neural Information Processing Systems, 2015, 28: 91-99.
[19] HE Z, ZHANG L. Multi-adversarial faster-rcnn for unrestricted object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 6668-6677.
[20] SAITO K, USHIKU Y, HARADA T, et al. Strong-weak distribution alignment for adaptive object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 6956-6965.
[21] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 2980-2988.
[22] ZHENG Y, HUANG D, LIU S, et al. Cross-domain object detection through coarse-to-fine feature adaptation[C]//Proceedings of the IEEE Conference on Computer Vision and PatternRecognition. 2020: 13766-13775.
[23] VS V, GUPTA V, OZA P, et al. Mega-cda: Memory guided attention for category-aware unsupervised domain adaptive object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2021: 4516-4526.
[24] ZHANG D, LI J, XIONG L, et al. Cycle-consistent domain adaptive faster rcnn[J]. IEEE Access, 2019, 7: 123903-123911.
[25] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 2223-2232.
[26] HSU H K, YAO C H, TSAI Y H, et al. Progressive domain adaptation for object detection[C]//Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 2020:749-757.
[27] ROYCHOWDHURY A, CHAKRABARTY P, SINGH A, et al. Automatic adaptation of object detectors to new domains using self-training[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 780-790.
[28] KHODABANDEH M, VAHDAT A, RANJBAR M, et al. A robust learning approach to domain adaptive object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 480-490.
[29] CAI Q, PAN Y, NGO C W, et al. Exploring object relation in mean teacher for cross-domain detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2019: 11457-11466.
[30] DENG J, LI W, CHEN Y, et al. Unbiased mean teacher for cross-domain object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2021: 4091-4101.
[31] YANG J, SHI S, WANG Z, et al. St3d: Self-training for unsupervised domain adaptation on 3d object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2021: 10368-10378.
[32] YANG J, SHI S, WANG Z, et al. St3d++: Denoised self-training for unsupervised domain adaptation on 3d object detection[J]. arXiv preprint arXiv:2108.06682, 2021.
[33] LUO Z, CAI Z, ZHOU C, et al. Unsupervised domain adaptive 3d detection with multi-level consistency[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision.2021: 8866-8875.
[34] XU Q, ZHOU Y, WANG W, et al. Spg: Unsupervised domain adaptation for 3d object detection via semantic point generation[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021: 15446-15456.
[35] YOU Y, DIAZ-RUIZ C A, WANG Y, et al. Exploiting playbacks in unsupervised domain adaptation for 3d object detection[J]. arXiv preprint arXiv:2103.14198, 2021.
[36] HEGDE D, SINDAGI V, KILIC V, et al. Uncertainty-aware mean teacher for source-free unsupervised domain adaptive 3d object detection[J]. arXiv preprint arXiv:2109.14651, 2021.
[37] HE K, FAN H, WU Y, et al. Momentum contrast for unsupervised visual representation learning [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020:9729-9738.
[38] CHEN T, KORNBLITH S, NOROUZI M, et al. A simple framework for contrastive learning of visual representations[C]//International Conference on Machine Learning. PMLR, 2020:1597-1607.
[39] XIE S, GIRSHICK R, DOLLÁR P, et al. Aggregated residual transformations for deep neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 1492-1500.
[40] MA N, ZHANG X, ZHENG H T, et al. Shufflenet v2: Practical guidelines for effcient cnn architecture design[C]//European Conference on Computer Vision. 2018: 116-131.
[41] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An extremely effcient convolutional neural network for mobile devices[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 6848-6856.
[42] TAN M, LE Q. Effcientnet: Rethinking model scaling for convolutional neural networks[C]//International Conference on Machine Learning. PMLR, 2019: 6105-6114.
[43] QI C R, YI L, SU H, et al. Pointnet++: Deep hierarchical feature learning on point sets in a metric space[J]. arXiv preprint arXiv:1706.02413, 2017.
[44] WANG Y, SUN Y, LIU Z, et al. Dynamic graph cnn for learning on point clouds[J]. Acm Transactions On Graphics (tog), 2019, 38(5): 1-12.
[45] ZHOU Y, TUZEL O. Voxelnet: End-to-end learning for point cloud based 3d object detection [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4490-4499.
[46] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 779-788.
[47] REDMON J, FARHADI A. Yolo9000: better, faster, stronger[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 7263-7271.
[48] REDMON J, FARHADI A. Yolov3: An incremental improvement[J]. arXiv preprint arXiv:1804.02767, 2018.
[49] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: Optimal speed and accuracy of object detection[J]. arXiv preprint arXiv:2004.10934, 2020.
[50] LIU W, ANGUELOV D, ERHAN D, et al. Ssd: Single shot multibox detector[C]//European Conference on Computer Vision. Springer, 2016: 21-37.
[51] GIRSHICK R. Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1440-1448.
[52] DAI J, LI Y, HE K, et al. R-fcn: Object detection via region-based fully convolutional networks [J]. Advances in Neural Information Processing Systems, 2016, 29.
[53] TIAN Z, SHEN C, CHEN H, et al. Fcos: Fully convolutional one-stage object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 9627-9636.
[54] DUAN K, BAI S, XIE L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 6569-6578.
[55] LAW H, DENG J. Cornernet: Detecting objects as paired keypoints[C]//European Conference on Computer Vision. 2018: 734-750.
[56] YANG Z, LIU S, HU H, et al. Reppoints: Point set representation for object detection[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 9657-9666.
[57] YAN Y, MAO Y, LI B. Second: Sparsely embedded convolutional detection[J]. Sensors, 2018, 18(10): 3337.
[58] LANG A H, VORA S, CAESAR H, et al. Pointpillars: Fast encoders for object detection from point clouds[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019: 12697-12705.
[59] LI Y, BU R, SUN M, et al. Pointcnn: Convolution on x-transformed points[J]. Advances in Neural Information Processing Systems, 2018, 31: 820-830.
[60] SHI S, GUO C, JIANG L, et al. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2020: 10529-10538.
[61] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on ComputerVision and Pattern Recognition. 2014: 580-587.
[62] WU Y, KIRILLOV A, MASSA F, et al. Detectron2[EB/OL]. 2019. https://github.com/facebookresearch/detectron2.
[63] LIN T Y, DOLL’aR P, GIRSHICK R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2117-2125.
[64] IOFFE S, NORMALIZATION C S B. Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2014.
[65] HE K, GKIOXARI G, DOLLÁR P, et al. Mask r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 2961-2969.
[66] TEAM O D. Openpcdet: An open-source toolbox for 3d object detection from point clouds [EB/OL]. 2020. https://github.com/open-mmlab/OpenPCDet.
[67] QI C R, SU H, MO K, et al. Pointnet: Deep learning on point sets for 3d classification and segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 652-660.
[68] DAHO M E H, SETTOUTI N, LAZOUNI M E A, et al. Weighted vote for trees aggregation in random forest[C]//2014 International Conference on Multimedia Computing and Systems (ICMCS). IEEE, 2014: 438-443.
[69] GOODFELLOW I, BENGIO Y, COURVILLE A. Deep learning[M]. MIT press, 2016.
[70] ZHANG M R, LUCAS J, HINTON G, et al. Lookahead optimizer: k steps forward, 1 step back [J]. arXiv preprint arXiv:1907.08610, 2019.
[71] IZMAILOV P, PODOPRIKHIN D, GARIPOV T, et al. Averaging weights leads to wider optima and better generalization[J]. arXiv preprint arXiv:1803.05407, 2018.
[72] TARVAINEN A, VALPOLA H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results[J]. arXiv preprint arXiv:1703.01780, 2017.
[73] IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[C]//International Conference on Machine Learning. PMLR, 2015: 448-456.
[74] PADILLA R, PASSOS W L, DIAS T L B, et al. A comparative analysis of object detection metrics with a companion open-source toolkit[J/OL]. Electronics, 2021, 10(3). https://www. mdpi.com/2079-9292/10/3/279. DOI: 10.3390/electronics10030279.
[75] QIAN N. On the momentum term in gradient descent learning algorithms[J]. Neural networks, 1999, 12(1): 145-151.
[76] DOSOVITSKIY A, ROS G, CODEVILLA F, et al. CARLA: An open urban driving simulator [C]//Proceedings of the 1st Annual Conference on Robot Learning. 2017: 1-16.
[77] ZHANG Z, WANG S, HONG Y, et al. Distributed dynamic map fusion via federated learning for intelligent networked vehicles[C]//2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2021: 953-959.

所在学位评定分委会
计算机科学与工程系
国内图书分类号
TP183
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/335832
专题工学院_计算机科学与工程系
推荐引用方式
GB/T 7714
丁光耀. 面向自动驾驶应用的域适应多目标检测研究[D]. 深圳. 南方科技大学,2022.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
11930573-丁光耀-计算机科学与工(48403KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[丁光耀]的文章
百度学术
百度学术中相似的文章
[丁光耀]的文章
必应学术
必应学术中相似的文章
[丁光耀]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。