南方科技大学知识苑(SUSTech KC): 基于合作与竞争的多智能体强化学习车流生成

题名	基于合作与竞争的多智能体强化学习车流生成
其他题名	REINFORCEMENT LEARNING BASED VEHICLE FLOW GENERATION VIA COOPERATIVE AND COMPETITIVE MULTI-AGENTS
姓名	王鹏宇
姓名拼音	WANG Pengyu
学号	12032464
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	郝祁
导师单位	计算机科学与工程系
论文答辩日期	2023-05-13
论文提交日期	2023-06-25
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	交通仿真是自动驾驶算法训练与测试的重要手段，其中车流的生成是关键，高质量的车流仿真数据能有效提高自动驾驶决策规划算法的训练效果。真实的车流包含多样的、甚至危险的车辆行为，而目前的研究工作面临以下的技术挑战，难以控制与生成具有这些特性的仿真车流：（1）缺乏对车流生成质量的量化评估与控制方法，导致生成车流的特性单一平庸；（2）缺乏可控的避碰车流生成方法，导致生成的车流难以均衡碰撞危险性与真实性；（3）缺乏对车辆之间交互风格的研究，忽略了驾驶风格对车流质量的影响。为解决上述问题，本文通过车流复杂度来对车流的质量进行量化与分级，并提出一套复杂度可控的车流生成方法框架，以生成不同复杂度的仿真车流，研究的创新点包括：（1）提出基于物理的安全意识与基于驾驶风格的合作意识，通过这两个意识层面来定义与控制车流复杂度指标；（2）将速度障碍避碰方法引入多智能体强化学习的奖励函数设计以量化安全意识，实现用安全意识来控制生成车流的复杂度；（3）利用多智能体强化学习的合作与竞争机制来训练不同风格的车流以对合作意识进行分级，实现用合作意识来控制生成车流的复杂度。此外，本文还引入了差速与单车两种车辆运动学模型以增强提出方法在真实系统上的泛化性。本文设置了基于OpenAI Gym 的二维仿真环境与基于Unreal 引擎的Carla 三维仿真环境来对上述方法进行测试验证。实验结果表明，本文提出的方法能有效地控制生成具有不同复杂度指标的车流，并且被测自动驾驶车辆在不同复杂度的生成车流中能测试出与复杂度指标一致的显著性能差异，证明了生成方法的量化可控性。
关键词	自动驾驶车流生成多智能体强化学习安全意识合作意识
语种	中文
培养类别	独立培养
入学年份	2020
学位授予年份	2023-06
参考文献列表	[1] YANG Z, CHAI Y, ANGUELOV D, et al. Surfelgan: Synthesizing Realistic Sensor Data for Autonomous Driving[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020: 11115-11124. [2] DING W, CHEN B, XU M, et al. Learning to Collide: An Adaptive Safety-Critical Scenarios Generating Method[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2020. IEEE, 2020: 2243-2250. [3] DOSOVITSKIY A, ROS G, CODEVILLA F, et al. CARLA: An Open Urban Driving Simulator [C]//Conference on Robot Learning, CoRL 2017: volume 78. PMLR, 2017: 1-16. [4] LÓPEZ P Á, BEHRISCH M, BIEKER-WALZ L, et al. Microscopic Traffic Simulation Using SUMO[C]//International Conference on Intelligent Transportation Systems, ITSC 2018. IEEE, 2018: 2575-2582. [5] SHAH S, DEY D, LOVETT C, et al. AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles[C]//Field and Service Robotics, Results of the 11th International Conference, FSR 2017: volume 5. Springer, 2017: 621-635. [6] TREIBER M, HENNECKE A, HELBING D. Congested Traffic States in Empirical Observations and Microscopic Simulations[J]. Physical review E, 2000, 62(2): 1805. [7] ZHOU M, LUO J, VILLELA J, et al. SMARTS: Scalable Multi-Agent Reinforcement Learning Training School for Autonomous Driving[J]. CoRR, 2020, abs/2010.09776. [8] VAN DER MADE R, TIDEMAN M, LAGES U, et al. Automated Generation of Virtual Driving Scenarios from Test Drive Data[C]//24th International Technical Conference on the Enhanced Safety of Vehicles (ESV) National Highway Traffic Safety Administration: 15-0268. 2015. [9] KRUBER F, WURST J, BOTSCH M. An Unsupervised Random Forest Clustering Technique for Automatic Traffic Scenario Categorization[C]//2018 21st International Conference on intelligent transportation systems (ITSC). IEEE, 2018: 2811-2818. [10] SUO S, REGALADO S, CASAS S, et al. Trafficsim: Learning to Simulate Realistic Multi-Agent Behaviors[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 10400-10409. [11] IGL M, KIM D, KUEFLER A, et al. Symphony: Learning Realistic and Diverse Agents for Autonomous Driving Simulation[C]//2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022: 2445-2451. [12] XU D, CHEN Y, IVANOVIC B, et al. BITS: Bi-level Imitation for Traffic Simulation[J]. CoRR, 2022, abs/2208.12403. [13] FENG L, LI Q, PENG Z, et al. TraffcGen: Learning to Generate Diverse and Realistic Traffic Scenarios[J]. CoRR, 2022, abs/2210.06609. [14] SUN Q, HUANG X, WILLIAMS B C, et al. InterSim: Interactive Traffic Simulation via Explicit Relation Modeling[C]//2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2022: 11416-11423. [15] REMPE D, PHILION J, GUIBAS L J, et al. Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior[C]//IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, 2022: 17284-17294. [16] WACHI A. Failure-Scenario Maker for Rule-Based Agent Using Multi-agent Adversarial Reinforcement Learning and its Application to Autonomous Driving[C]//Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI 2019. 2019: 6006-6012. [17] CHEN B, CHEN X, WU Q, et al. Adversarial Evaluation of Autonomous Vehicles in LaneChange Scenarios[J]. IEEE Trans. Intell. Transp. Syst., 2022, 23(8): 10333-10342. [18] KUUTTI S, FALLAH S, BOWDEN R. Training Adversarial Agents to Exploit Weaknesses in Deep Control Policies[C]//2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020: 108-114. [19] NIU H, REN K, XU Y, et al. (Re)2H2O: Autonomous Driving Scenario Generation via Reversely Regularized Hybrid Offline-and-Online Reinforcement Learning[J]. CoRR, 2023, abs/2302.13726. [20] QUEIROZ R, BERGER T, CZARNECKI K. GeoScenario: An Open DSL for Autonomous Driving Scenario Representation[C]//2019 IEEE Intelligent Vehicles Symposium (IV). 2019: 287-294. [21] KLISCHAT M, ALTHOFF M. Generating Critical Test Scenarios for Automated Vehicles with Evolutionary Algorithms[C]//2019 IEEE Intelligent Vehicles Symposium (IV). 2019: 2352- 2358. [22] MENZEL T, BAGSCHIK G, MAURER M. Scenarios for Development, Test and Validation of Automated Vehicles[C]//2018 IEEE Intelligent Vehicles Symposium (IV). 2018: 1821-1827. [23] SOHN K, LEE H, YAN X. Learning Structured Output Representation using Deep Conditional Generative Models[C]//Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems. 2015: 3483-3491. [24] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is All You Need[C]//Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems. 2017: 5998-6008. [25] SUNEHAG P, LEVER G, GRUSLYS A, et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS). 2018: 2085-2087. [26] LOWE R, WU Y, TAMAR A, et al. Multi-Agent Actor-Critic for Mixed Cooperative Competitive Environments[C]//Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems. 2017: 6379-6390. [27] FOERSTER J, FARQUHAR G, AFOURAS T, et al. Counterfactual Multi-Agent Policy Gradients[C]//Proceedings of the AAAI conference on artificial intelligence. 2018: 2974-2982. [28] RASHID T, SAMVELYAN M, DE WITT C S, et al. QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning[C]//Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018: 4292-4301. [29] SON K, KIM D, KANG W J, et al. QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning[C]//Proceedings of the 36th International Conference on Machine Learning. PMLR, 2019: 5887-5896. [30] YANG Y, HAO J, LIAO B, et al. Qatten: A General Framework for Cooperative Multiagent Reinforcement Learning[J]. CoRR, 2020, abs/2002.03939. [31] WANG J, REN Z, LIU T, et al. QPLEX: Duplex Dueling Multi-Agent Q-Learning[C]//9th International Conference on Learning Representations. OpenReview.net, 2021. [32] DE WITT C S, PENG B, KAMIENNY P, et al. Deep Multi-Agent Reinforcement Learning for Decentralized Continuous Cooperative Control[J]. CoRR, 2020, abs/2003.06709. [33] SU J, ADAMS S, BELING P A. Value-Decomposition Multi-Agent Actor-Critics[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2021: 11352-11360. [34] WANG Y, HAN B, WANG T, et al. DOP: Of-Policy Multi-Agent Decomposed Policy Gradients [C]//9th International Conference on Learning Representations, ICLR 2021. 2021. [35] ZHANG T, LI Y, WANG C, et al. FOP: Factorizing Optimal Joint Policy of Maximum-Entropy Multi-Agent Reinforcement Learning[C]//Proceedings of the 38th International Conference on Machine Learning. PMLR, 2021: 12491-12500. [36] HAN D, LU C X, MICHALAK T P, et al. Multiagent Model-based Credit Assignment for Continuous Control[C]//FALISZEWSKI P, MASCARDI V, PELACHAUD C, et al. 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS). 2022: 571-579. [37] HE H, BOYD-GRABER J L. Opponent Modeling in Deep Reinforcement Learning[C]// Proceedings of the 33rd International Conference on Machine Learning. 2016: 1804-1813. [38] GROVER A, AL-SHEDIVAT M, GUPTA J K, et al. Learning Policy Representations in Multiagent Systems[C]//DY J G, KRAUSE A. Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018: 1797-1806. [39] FOERSTER J N, CHEN R Y, AL-SHEDIVAT M, et al. Learning with Opponent-Learning Awareness[C]//Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS). 2018: 122-130. [40] RAILEANU R, DENTON E, SZLAM A, et al. Modeling Others using Oneself in Multi-Agent Reinforcement Learning[C]//Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018: 4254-4263. [41] XIE A, LOSEY D P, TOLSMA R, et al. Learning Latent Representations to Influence Multi-Agent Interaction[C]//4th Conference on Robot Learning, CoRL 2020: volume 155. PMLR, 2020: 575-588. [42] PAPOUDAKIS G, CHRISTIANOS F, ALBRECHT S V. Agent Modelling under Partial Observability for Deep Reinforcement Learning[C]//Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems. 2021: 19210- 19222. [43] WANG T, DONG H, LESSER V R, et al. ROMA: Multi-Agent Reinforcement Learning with Emergent Roles[C]//Proceedings of the 37th International Conference on Machine Learning. PMLR, 2020: 9876-9886. [44] WANG T, GUPTA T, MAHAJAN A, et al. RODE: Learning Roles to Decompose Multi-Agent Tasks[C]//9th International Conference on Learning Representations, ICLR 2021. 2021. [45] YU A, PALEFSKY-SMITH R, BEDI R. Deep Reinforcement Learning for Simulated Autonomous Vehicle Control[J]. Course Project Reports: Winter, 2016. [46] SALLAB A E, ABDOU M, PEROT E, et al. End-to-End Deep Reinforcement Learning for Lane Keeping Assist[J]. CoRR, 2016, abs/1612.04340. [47] WYMANN B, ESPIÉ E, GUIONNEAU C, et al. Torcs, The Open Racing Car Simulator[J]. Software available at http://torcs. sourceforge. net, 2000, 4(6): 2. [48] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-Level Control through Deep Reinforcement Learning[J]. nature, 2015, 518(7540): 529-533. [49] SALLAB A E, ABDOU M, PEROT E, et al. Deep Reinforcement Learning Framework for Autonomous Driving[J]. Electronic Imaging, 2017: 70-76. [50] WANG P, CHAN C. Formulation of Deep Reinforcement Learning Architecture toward Autonomous Driving for on-Ramp Merge[C]//20th IEEE International Conference on Intelligent Transportation Systems (ITSC). 2017: 1-6. [51] HOCHREITER S, SCHMIDHUBER J. Long Short-Term Memory[J]. Neural Computation, 1997, 9(8): 1735-1780. [52] QIAO Z, MUELLING K, DOLAN J, et al. POMDP and Hierarchical Options MDP with Continuous Actions for Autonomous Driving at Intersections[C]//21st International Conference on Intelligent Transportation Systems (ITSC). 2018: 2377-2382. [53] KENDALL A, HAWKE J, JANZ D, et al. Learning to Drive in a Day[C]//International Conference on Robotics and Automation (ICRA). IEEE, 2019: 8248-8254. [54] WANG P, CHAN C, DE LA FORTELLE A. A Reinforcement Learning Based Approach for Automated Lane Change Maneuvers[C]//2018 IEEE Intelligent Vehicles Symposium (IV). 2018: 1379-1384. [55] CHEN S, SUN Y, LI D, et al. Runtime Safety Assurance for Learning-enabled Control of Autonomous Driving Vehicles[C]//2022 International Conference on Robotics and Automation (ICRA). IEEE, 2022: 8978-8984. [56] NOSRATI M S, ABOLFATHI E A, ELMAHGIUBI M, et al. Towards Practical Hierarchical Reinforcement Learning for Multi-Lane Autonomous Driving[Z]. 2018. [57] JANG K, VINITSKY E, CHALAKI B, et al. Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles[C]//Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems. 2019: 291-300. [58] ZHOU T, WANG L, CHEN R, et al. Accelerating Reinforcement Learning for Autonomous Driving using Task-Agnostic and Ego-Centric Motion Skills[J]. CoRR, 2022, abs/2209.12072. [59] SHARIFZADEH S, CHIOTELLIS I, TRIEBEL R, et al. Learning to Drive using Inverse Reinforcement Learning and Deep Q-Networks[J]. CoRR, 2016, abs/1612.03653. [60] CODEVILLA F, MÜLLER M, LÓPEZ A M, et al. End-to-End Driving Via Conditional Imitation Learning[C]//2018 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2018: 1-9. [61] PALANISAMY P. Multi-Agent Connected Autonomous Driving using Deep Reinforcement Learning[C]//International Joint Conference on Neural Networks (IJCNN). IEEE, 2020: 1-7. [62] BALICO L N, LOUREIRO A A F, NAKAMURA E F, et al. Localization Prediction in Vehicular Ad Hoc Networks[J]. IEEE Commun. Surv. Tutorials, 2018, 20(4): 2784-2803. [63] YU C, WANG X, XU X, et al. Distributed Multiagent Coordinated Learning for Autonomous Driving in Highways Based on Dynamic Coordination Graphs[J]. IEEE Trans. Intell. Transp. Syst., 2020, 21(2): 735-748. [64] HA P Y J, CHEN S, DONG J, et al. Leveraging the Capabilities of Connected and Autonomous Vehicles and Multi-Agent Reinforcement Learning to Mitigate Highway Bottleneck Congestion [J]. CoRR, 2020, abs/2010.05436. [65] KIPF T N, WELLING M. Semi-Supervised Classification with Graph Convolutional Networks [C]//5th International Conference on Learning Representations (ICLR). 2017. [66] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous Control with Deep Reinforcement Learning[C]//4th International Conference on Learning Representations (ICLR). 2016. [67] SPATHARIS C, BLEKAS K. Double Deep Multiagent Reinforcement Learning for Autonomous Driving in Traffic Maps with Road Segments and Unsignaled Intersections[C]//23rd IEEE International Conference on Intelligent Transportation Systems (ITSC). 2020: 1-6. [68] SADAT A, SEGAL S, CASAS S, et al. Diverse Complexity Measures for Dataset Curation in Self-Driving[C]//IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021: 8609-8616. [69] WU W, ZHOU X. An Intelligent Evaluation Method of Application Scenario Complexity Level of Unmanned Swarms[C]//IEEE International Conference on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking, ISPA/BDCloud/SocialCom/SustainCom 2020. IEEE, 2020: 1292-1299. [70] XIA Q, DUAN J, GAO F, et al. Automatic Generation Method of Test Scenario for ADAS Based on Complexity [R]. SAE Technical Paper, 2017: 9. [71] MINDERHOUD M M, BOVY P H. Extended Time-to-Collision Measures for Road Traffic Safety Assessment[J]. Accident Analysis & Prevention, 2001, 33(1): 89-97. [72] SRIDHAR B, SHETH K S, GRABBE S. Airspace Complexity and its Application in Air Traffic Management[C]//2nd USA/Europe Air Traffic Management R&D Seminar. 1998: 1-6. [73] MASALONIS A J, CALLAHAM M B, WANKE C R. Dynamic Density and Complexity Metrics for Realtime Traffic Flow Management[C]//Proceedings of the 5th USA/Europe Air Traffic Management R & D Seminar. 2003: 139. [74] XUE M, DO M. Scenario Complexity for Unmanned Aircraft System Traffic[C]//AIAA Aviation 2019 Forum. 2019: 3513. [75] HENDERSON P, ISLAM R, BACHMAN P, et al. Deep Reinforcement Learning That Matters [C]//Proceedings of the AAAI conference on artificial intelligence. 2018: 3207-3214. [76] HAN R, CHEN S, WANG S, et al. Reinforcement Learned Distributed Multi-Robot Navigation With Reciprocal Velocity Obstacle Shaped Rewards[J]. IEEE Robotics Autom. Lett., 2022, 7 (3): 5896-5903. [77] FIORINI P, SHILLER Z. Motion Planning in Dynamic Environments Using Velocity Obstacles [J]. The international journal of robotics research, 1998, 17(7): 760-772. [78] VAN DEN BERG J P, LIN M C, MANOCHA D. Reciprocal Velocity Obstacles for Real-time Multi-Agent Navigation[C]//2008 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2008: 1928-1935. [79] CHEN Y F, EVERETT M, LIU M, et al. Socially Aware Motion Planning with Deep Reinforcement Learning[C]//2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2017: 1343-1350. [80] HAN R, CHEN S, HAO Q. Cooperative Multi-Robot Navigation in Dynamic Environment with Deep Reinforcement Learning[C]//2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020: 448-454. [81] TANG Z, YU C, CHEN B, et al. Discovering Diverse Multi-Agent Strategic Behavior via Reward Randomization[C]//9th International Conference on Learning Representations (ICLR). 2021. [82] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft Actor-Critic Algorithms and Applications[J]. CoRR, 2018, abs/1812.05905. [83] LYLE C, BELLEMARE M G, CASTRO P S. A Comparative Analysis of Expected and Distributional Reinforcement Learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2019: 4504-4511. [84] FOX R, PAKMAN A, TISHBY N. Taming the Noise in Reinforcement Learning via Soft Updates[C]//Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence (UAI). 2016. [85] DABNEY W, ROWLAND M, BELLEMARE M, et al. Distributional Reinforcement Learning with Quantile Regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018: 2892-2901. [86] DABNEY W, OSTROVSKI G, SILVER D, et al. Implicit Quantile Networks for Distributional Reinforcement Learning[C]//Proceedings of the 35th International Conference on Machine Learning: volume 80. PMLR, 2018: 1104-1113. [87] HUBER P J. Robust Estimation of a Location Parameter[M]. Springer New York, 1992: 492- 518. [88] FUJIMOTO S, VAN HOOF H, MEGER D. Addressing Function Approximation Error in Actor-Critic Methods[C]//Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018: 1582-1591. [89] MATIGNON L, LAURENT G J, LE FORT-PIAT N. Hysteretic Q-learning : An Algorithm for Decentralized Reinforcement Learning in Cooperative Multi-Agent Teams[C]//2007 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2007: 64-69. [90] REEDS J, SHEPP L. Optimal Paths for A Car that Goes Both Forwards and Backwards[J]. Pacific journal of mathematics, 1990, 145(2): 367-393. [91] ROSS S, GORDON G J, BAGNELL D. A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning[C]//GORDON G J, DUNSON D B, DUDÍK M. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS). 2011: 627-635. [92] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. OpenAI Gym[J]. CoRR, 2016, abs/1606.01540.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP18
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/544613
专题	工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	王鹏宇. 基于合作与竞争的多智能体强化学习车流生成[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12032464-王鹏宇-计算机科学与工（5625KB）	--	--	限制开放	--	请求全文