南方科技大学知识苑(SUSTech KC): 面向自动驾驶不利场景的目标检测数据增强研究

题名	面向自动驾驶不利场景的目标检测数据增强研究
其他题名	RESEARCH ON DATA AUGMENTATION FOR OBJECT DETECTION OF AUTONOMOUS DRIVING IN ADVERSE CONDITIONS
姓名	彭阳
姓名拼音	PENG Yang
学号	12032453
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	郝祁
导师单位	计算机科学与工程系
论文答辩日期	2023-11-28
论文提交日期	2023-12-21
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	人工智能已经成为经济发展的新引擎以及国际竞争的新焦点，是引领未来的战略性技术。人工智能技术高度集中的自动驾驶汽车已成为全球创新热点。自动驾驶技术通常包含感知、决策规划、控制三个模块，通过感知系统识别环境中的目标、障碍物和交通信息，并将感知结果传输给决策和规划模块做出决策以及规划方案，由控制模块控制车辆执行规划方案。自动驾驶汽车通过自身搭载的多传感器对周围驾驶环境达到全方位感知，对行人、周围车辆和交通标志等目标进行识别和定位，自动驾驶数据集对于训练感知系统具有极其重要的意义。由于各种不利条件的影响，大规模部署自动驾驶仍然面临传感器采集数据困难、紧要场景数据稀缺等问题，自动驾驶车辆的目标检测性能在域偏移时显著降低。本文针对自动驾驶不利条件下的图像数据增强技术进行研究，生成不利场景测试数据用于提升自动驾驶感知系统性能。当前基于生成对抗网络的数据增强方法已经取得很好的效果。然而，这些研究还存在以下两个方面的不足：（1）对图像的特征提取不足以适应复杂的自动驾驶场景，使得生成过程中部分关键目标转换错误，导致无法达到理想的增强效果；（2）在夜晚图像生成时没有考虑场景中不均匀的光分布，导致生成大量不合理光效应从而严重影响目标检测算法性能。为解决上述问题，实现不利条件数据增强中兼顾内容特征学习与风格特征学习的目的，本文主要有以下两个贡献：（1）提出了基于双注意力机制的生成对抗网络，提高生成器对于丰富语义特征的提取能力。同时设计了检测损失，约束生成中的检测目标一致性；（2）提出了光效应分解的夜晚场景生成方法，通过对图像物理建模来分解光效应层，帮助生成器学习不均匀的光分布，在生成过程中抑制不合理光效应的产生。为了保留暗部区域结构完整，设计了感知损失以保留暗部细节。本文在自动驾驶公开数据集上进行目标检测实验验证，实验结果表明本文提出的方法在自动驾驶数据增强上能够得到更接近真实分布的数据，对于目标检测算法性能的提升更为明显。
关键词	生成对抗网络数据增强目标检测自动驾驶注意力机制
语种	中文
培养类别	独立培养
入学年份	2020
学位授予年份	2023-12
参考文献列表	[1] REN S, HE K, GIRSHICK R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[C]//Advances in Neural Information Processing Systems: volume 28. 2015. [2] LIU W, ANGUELOV D, ERHAN D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer, 2016: 21-37. [3] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 779-788. [4] SHUAI B, BERNESHAWI A G, MODOLO D, et al. Multi-object tracking with siamese track-rcnn[A]. 2020. [5] WANG Z, ZHENG L, LIU Y, et al. Towards real-time multi-object tracking[C]//European Conference on Computer Vision. Springer, 2020: 107-122. [6] WOJKE N, BEWLEY A, PAULUS D. Simple online and realtime tracking with a deep association metric[C]//2017 IEEE International Conference on Image Processing (ICIP). IEEE, 2017: 3645-3649. [7] ZHU L, JI D, ZHU S, et al. Learning statistical texture for semantic segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:12537-12546. [8] CHEN L C, PAPANDREOU G, KOKKINOS I, et al. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(4): 834-848. [9] GEIGER A, LENZ P, URTASUN R. Are we ready for autonomous driving? the kitti vision benchmark suite[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2012: 3354-3361. [10] YU F, XIAN W, CHEN Y, et al. Bdd100k: A diverse driving video database with scalable annotation tooling: volume 2[A]. 2018: 6. [11] CAESAR H, BANKITI V, LANG A H, et al. Nuscenes: A multimodal dataset for autonomous driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11621-11631. [12] HAHNER M, DAI D, SAKARIDIS C, et al. Semantic understanding of foggy scenes with purely synthetic data[C]//2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019: 3675-3681. [13] XU R, XIANG H, XIA X, et al. Opv2v: An open benchmark dataset and fusion pipeline for perception with vehicle-to-vehicle communication[C]//2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022: 2583-2589. [14] SHAH S, DEY D, LOVETT C, et al. Airsim: High-fidelity visual and physical simulation for autonomous vehicles[C]//Field and Service Robotics: Results of the 11th International Conference. Springer, 2018: 621-635. [15] KIM S W, PHILION J, TORRALBA A, et al. Drivegan: Towards a controllable high-quality neural simulation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 5820-5829. [16] ZHANG M, ZHANG Y, ZHANG L, et al. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems[C]//Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 2018: 132-142. [17] SAKARIDIS C, DAI D, VAN GOOL L. Semantic foggy scene understanding with synthetic data[J]. International Journal of Computer Vision, 2018, 126: 973-992. [18] WRENNINGE M, UNGER J. Synscapes: A photorealistic synthetic dataset for street scene parsing[A]. 2018. [19] HALDER S S, LALONDE J F, CHARETTE R D. Physics-based rendering for improving robustness to rain[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 10203-10212. [20] PORAV H, BRULS T, NEWMAN P. I can see clearly now: Image restoration via de-raining[C]//2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019: 7087-7093. [21] ALLETTO S, CARLIN C, RIGAZIO L, et al. Adherent raindrop removal with self-supervised attention maps and spatio-temporal generative adversarial networks[C]//2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW). IEEE Computer Society, 2019: 2329-2338. [22] DOSOVITSKIY A, ROS G, CODEVILLA F, et al. CARLA: An open urban driving simulator [C]//Conference on Robot Learning. PMLR, 2017: 1-16. [23] ROS G, SELLART L, MATERZYNSKA J, et al. The synthia dataset: A large collection of synthetic images for semantic segmentation of urban scenes[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 3234-3243. [24] SUN T, SEGU M, POSTELS J, et al. SHIFT: A synthetic driving dataset for continuous multi-task domain adaptation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 21371-21382. [25] LIU M Y, BREUEL T, KAUTZ J. Unsupervised image-to-image translation networks[C]// Advances in Neural Information Processing Systems: volume 30. 2017. [26] LARSEN A B L, SØNDERBY S K, LAROCHELLE H, et al. Autoencoding beyond pixels using a learned similarity metric[C]//International Conference on Machine Learning. PMLR, 2016: 1558-1566. [27] ZHANG Y, LING H, GAO J, et al. Datasetgan: Efficient labeled data factory with minimal human effort[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 10145-10155. [28] LI X, KOU K, ZHAO B. Weather GAN: Multi-domain weather translation using generative adversarial networks[A]. 2021. [29] GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[C]//Advances in Neural Information Processing Systems: volume 27. 2014. [30] HERTZMANN A, JACOBS C E, OLIVER N, et al. Image analogies[C]//SIGGRAPH. ACM, 2001: 327-340. [31] ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.2017: 1125-1134. [32] ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 2223-2232. [33] ZHANG Z, YANG L, ZHENG Y. Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 9242-9251. [34] XIE X, CHEN J, LI Y, et al. Self-supervised cyclegan for object-preserving image-to-image domain adaptation[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XX 16. Springer, 2020: 498-513. [35] ZHU J Y, ZHANG R, PATHAK D, et al. Toward multimodal image-to-image translation[C]// Advances in Neural Information Processing Systems: volume 30. 2017. [36] HUANG X, LIU M Y, BELONGIE S, et al. Multimodal unsupervised image-to-image translation[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 172-189. [37] ULYANOV D, VEDALDI A, LEMPITSKY V. Instance normalization: The missing ingredient for fast stylization[A]. 2016. [38] LEE H Y, TSENG H Y, HUANG J B, et al. Diverse image-to-image translation via disentangled representations[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 35-51. [39] ZHANG H, GOODFELLOW I, METAXAS D, et al. Self-attention generative adversarial networks[C]//International Conference on Machine Learning. PMLR, 2019: 7354-7363. [40] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Advances in Neural Information Processing Systems: volume 30. 2017. [41] LU Y, LIU J, ZHAO X, et al. Image translation with attention mechanism based on generative adversarial networks[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, 2020: 364-369. [42] ALAMI MEJJATI Y, RICHARDT C, TOMPKIN J, et al. Unsupervised attention-guided image-to-image translation[C]//Advances in Neural Information Processing Systems: volume 31. 2018. [43] CHEN X, XU C, YANG X, et al. Attention-gan for object transfiguration in wild images[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 164-180. [44] TANG H, LIU H, XU D, et al. Attentiongan: Unpaired image-to-image translation using attention-guided generative adversarial networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021. [45] TANG H, XU D, SEBE N, et al. Attention-guided generative adversarial networks for unsupervised image-to-image translation[C]//2019 International Joint Conference on Neural Networks(IJCNN). IEEE, 2019: 1-8. [46] EMAMI H, ALIABADI M M, DONG M, et al. Spa-gan: Spatial attention gan for image-to-image translation[J]. IEEE Transactions on Multimedia, 2020, 23: 391-401. [47] KOMODAKIS N, ZAGORUYKO S. Paying more attention to attention: improving the performance of convolutional neural networks via attention transfer[C]//International Conference on Learning Representations. 2017. [48] KIM J, KIM M, KANG H, et al. U-GAT-IT: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation[C]//International Conference on Learning Representations. 2019. [49] ZHOU B, KHOSLA A, LAPEDRIZA A, et al. Learning deep features for discriminative localization[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2921-2929. [50] YANG R, PENG C, WANG C, et al. CSAGAN: Channel and spatial attention-guided generative adversarial networks for unsupervised image-to-image translation[C]//2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2021: 3258-3265. [51] WOO S, PARK J, LEE J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European Conference on Computer Vision (ECCV). 2018: 3-19. [52] TANG H, BAI S, SEBE N. Dual attention gans for semantic image synthesis[C]//Proceedings of the 28th ACM International Conference on Multimedia. 2020: 1994-2002. [53] MA S, FU J, CHEN C W, et al. Da-gan: Instance-level image translation by deep attention generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 5657-5666. [54] MO S, CHO M, SHIN J. InstaGAN: Instance-aware image-to-image translation[C]//ICLR 2019. ICLR committee, 2019. [55] SHEN Z, HUANG M, SHI J, et al. Towards instance-level image-to-image translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3683-3692. [56] BHATTACHARJEE D, KIM S, VIZIER G, et al. Dunit: Detection-based unsupervised image-to-image translation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 4787-4796. [57] KINGMA D P, WELLING M. Auto-encoding variational bayes[A]. 2013. [58] ZHANG W, LIU Y, DONG C, et al. Ranksrgan: Generative adversarial networks with ranker for image super-resolution[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 3096-3105. [59] WANG Y, PERAZZI F, MCWILLIAMS B, et al. A fully progressive approach to single-image super-resolution[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018: 864-873. [60] YUAN Y, LIU S, ZHANG J, et al. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2018: 701-710. [61] LEDIG C, THEIS L, HUSZÁR F, et al. Photo-realistic single image super-resolution using a generative adversarial network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 4681-4690. [62] JOHNSON J, ALAHI A, FEI-FEI L. Perceptual losses for real-time style transfer and super-resolution[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part II 14. Springer, 2016: 694-711. [63] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 770-778. [64] CHEN X, DUAN Y, HOUTHOOFT R, et al. Infogan: Interpretable representation learning by information maximizing generative adversarial nets[C]//Advances in Neural Information Processing Systems: volume 29. 2016. [65] ULYANOV D, VEDALDI A, LEMPITSKY V. Improved texture networks: Maximizing quality and diversity in feed-forward stylization and texture synthesis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 6924-6932. [66] HUANG X, BELONGIE S. Arbitrary style transfer in real-time with adaptive instance normalization[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017:1501-1510. [67] WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 7794-7803. [68] GIRSHICK R. Fast r-cnn[C]//Proceedings of the IEEE International Conference on Computer Vision. 2015: 1440-1448. [69] GIRSHICK R, DONAHUE J, DARRELL T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 580-587. [70] DAI J, LI Y, HE K, et al. R-fcn: Object detection via region-based fully convolutional networks[C]//Advances in Neural Information Processing Systems: volume 29. 2016. [71] REDMON J, FARHADI A. Yolov3: An incremental improvement[A]. 2018. [72] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. Yolov4: Optimal speed and accuracy of object detection[A]. 2020. [73] JOCHER G, STOKEN A, BOROVEC J, et al. ultralytics/yolov5: v5. 0-YOLOv5-P6 1280 models, AWS, Supervise. ly and YouTube integrations[J]. Zenodo, 2021. [74] ZHOU X, ZHUO J, KRAHENBUHL P. Bottom-up object detection by grouping extreme and center points[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 850-859. [75] TIAN Z, SHEN C, CHEN H, et al. FCOS: A simple and strong anchor-free object detector[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 44(4): 1922-1933. [76] KONG T, SUN F, LIU H, et al. Foveabox: Beyound anchor-based object detection[J]. IEEE Transactions on Image Processing, 2020, 29: 7389-7398. [77] YUN S, HAN D, OH S J, et al. Cutmix: Regularization strategy to train strong classifiers with localizable features[C]//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019: 6023-6032. [78] KARRAS T, AILA T, LAINE S, et al. Progressive growing of GANs for improved quality, stability, and variation[C]//International Conference on Learning Representations. 2018. [79] WU Y, ZHANG R, YANAI K. Attention guided unsupervised image-to-image translation with progressively growing strategy[C]//Asian Conference on Pattern Recognition. Springer, 2019: 85-99. [80] WANG T C, LIU M Y, ZHU J Y, et al. High-resolution image synthesis and semantic manipulation with conditional gans[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 8798-8807. [81] KAMRAN S A, HOSSAIN K F, TAVAKKOLI A, et al. RV-GAN: Segmenting retinal vascular structure in fundus paintings using a novel multi-scale generative adversarial network[C]//Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, September 27–October 1, 2021, Proceedings, Part VIII 24. Springer, 2021: 34-44. [82] FU J, LIU J, TIAN H, et al. Dual attention network for scene segmentation[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 3146-3154. [83] ZHAO H, SHI J, QI X, et al. Pyramid scene parsing network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2881-2890. [84] ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: Faster and better learning for bounding box regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence: volume 34. 2020: 12993-13000. [85] XU H, GAO Y, YU F, et al. End-to-end learning of driving models from large-scale video datasets[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017: 2174-2182. [86] SUN P, KRETZSCHMAR H, DOTIWALLA X, et al. Scalability in perception for autonomous driving: Waymo open dataset[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 2446-2454. [87] SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training gans[J]. Advances in Neural Information Processing Systems, 2016, 29. [88] HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local nash equilibrium[C]//Advances in Neural Information Processing Systems: volume 30. 2017. [89] BIńKOWSKI M, SUTHERLAND D J, ARBEL M, et al. Demystifying MMD GANs[C]//International Conference on Learning Representations. 2018. [90] SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016: 2818-2826. [91] PASZKE A, GROSS S, MASSA F, et al. Pytorch: An imperative style, high-performance deep learning library[C]//Advances in Neural Information Processing Systems: volume 32. 2019. [92] FARAHANI A, VOGHOEI S, RASHEED K, et al. A brief review of domain adaptation[J]. Advances in Data Science and Information Engineering: Proceedings from ICDATA 2020 and IKE 2020, 2021: 877-894. [93] KINGMA D P, BA J. Adam: A method for stochastic optimization[A]. 2014. [94] SHARMA A, TAN R T. Nighttime visibility enhancement by increasing the dynamic range and suppression of light effects[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 11977-11986. [95] JEONG S, KIM Y, LEE E, et al. Memory-guided unsupervised image-to-image translation[C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 6558-6567. [96] KIM S, BAEK J, PARK J, et al. InstaFormer: Instance-aware image-to-image translation with transformer[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 18321-18331. [97] GUO X, LI Y, LING H. LIME: Low-light image enhancement via illumination map estimation[J]. IEEE Transactions on Image Processing, 2017, 26(2). [98] LI Y, BROWN M S. Single image layer separation using relative smoothness[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 2752-2759. [99] ZHANG X, NG R, CHEN Q. Single image reflection separation with perceptual losses[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018: 4786-4794. [100] CHEN Q, KOLTUN V. Photographic image synthesis with cascaded refinement networks[C]//Proceedings of the IEEE International Conference on Computer Vision. 2017: 1511-1520.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP183
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/633388
专题	南方科技大学工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	彭阳. 面向自动驾驶不利场景的目标检测数据增强研究[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12032453-彭阳-计算机科学与工程（22862KB）	--	--	限制开放	--	请求全文