南方科技大学知识苑(SUSTech KC): 基于深度强化学习的自动驾驶车辆横向控制算法研究

题名	基于深度强化学习的自动驾驶车辆横向控制算法研究
其他题名	Research on Lateral Control Algorithm of Autonomous Vehicle Based on Deep Reinforcement Learning
姓名	孙斯嘉
姓名拼音	SUN Sijia
学号	12032389
学位类型	硕士
学位专业	0801 力学
学科门类/专业学位类别	08 工学
导师	袁鸿雁
导师单位	力学与航空航天工程系
论文答辩日期	2023-05-18
论文提交日期	2023-07-03
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	近年来，自动驾驶的主动安全与乘坐舒适问题广受科研界与工业界关注。自动驾驶技术中的轨迹跟踪模块用于输出车辆的油门、刹车与方向盘转角，对车辆的安全性与乘坐质量产生重要影响。其中车辆的前轮转角由轨迹跟踪模块中的车辆横向控制算法所决定，然而许多传统车辆横向控制算法只关注车辆的轨迹跟踪精度而忽略了乘客的乘坐体验，即车辆行驶平滑度，并且具有实时解算性差、无法适应复杂环境等缺点。在上述背景下，本文提出了一种基于深度强化学习的自动驾驶车辆横向控制算法，用于实现车辆跟踪精度与行驶平滑度的较优平衡。首先，本文介绍了车辆单轨动力学模型并对模型公式进行了推导。将动力学模型简化为线性微分方程组的形式，对其进行前向欧拉积分以构建车辆动力学仿真环境，通过联合仿真验证了环境的稳定性。本文基于车辆动力学仿真环境实现了几何纯跟踪 (Pure-Pursuit) 控制算法与比例积分微分 (PID) 控制算法。研究分析了前瞻距离对 Pure-Pursuit控制器性能的影响，在几何关系上对轨迹跟踪 PID控制器的输入误差进行了优化，提高了其性能。为优化跟踪精度与车辆行驶平顺性，将 Pure-Pursuit控制算法与PID控制算法相结合，组成了兼具前馈与反馈控制的 PP-PID控制算法，通过对比分析验证了其有效性。其次，为进一步提高PP-PID控制器的性能，本文创新性地使用一种基于最优策略的深度强化学习算法，即近端策略优化 (PPO) 算法，对 PP-PID控制器中两种控制器的权重进行实时调整，形成了 RL-PP-PID控制算法。本文进一步地详细阐述了算法中模拟环境的基本框架以及状态空间、奖励函数和神经网络结构的设计思路和过程。通过优势函数归一化和梯度裁剪的方式提高了算法训练速度，优化了算法性能。最后，在车辆动力学仿真环境中进行了本算法的训练。为验证本算法的泛化性，选取四种难易程度不同的场景对本算法进行了测试。结果表明，相比于不应用强化学习的 PP-PID控制算法，本算法可以达到车辆跟踪精度与行驶平顺度的更优平衡。与几种复杂车辆横向控制算法相比较，RL-PP-PID算法以相对简单的框架实现了与其相近甚至更优的性能，充分验证了本算法的有效性与优越性。此外，为模拟真实环境中的传感器误差，在车辆位置中随机加入不同不同高斯噪声，高斯噪声，RL-PP-PID控制器控制器的控制表现均在可接的控制表现均在可接受范围内受范围内，验证了本算法的鲁棒性。验证了本算法的鲁棒性。
关键词	车辆控制轨迹跟踪深度强化学习车辆动力学近端策略优化算法
语种	中文
培养类别	独立培养
入学年份	2020
学位授予年份	2023-06
参考文献列表	[1] DICKMANNS E D, GRAEFE V. Dynamic monocular machine vision[J/OL]. Machine Vision and Applications, 1988, 1(4): 223-240. [2] THRUN S, MONTEMERLO M, DAHLKAMP H, et al. Stanley: The robot that won the DARPA Grand Challenge[J/OL]. Journal of Field Robotics, 2006, 23(9): 661-692. [3] PADEN B, ČÁP M, YONG S Z, et al. A Survey of Motion Planning and Control Techniques for Self-Driving Urban Vehicles[J/OL]. IEEE Transactions on Intelligent Vehicles, 2016, 1(1): 33-55. [4] CHEN L, FAN L, XIE G, et al. Moving-Object Detection From Consecutive Stereo Pairs Using Slanted Plane Smoothing[J/OL]. IEEE Transactions on Intelligent Transportation Systems, 2017, 18(11): 3093-3102. [5] LI Q, CHEN L, LI M, et al. A Sensor-Fusion Drivable-Region and Lane-Detection System for Autonomous Vehicle Navigation in Challenging Road Scenarios[J/OL]. IEEE Transactions on Vehicular Technology, 2014, 63(2): 540-555. [6] PULVER H, EIRAS F, CAROZZA L, et al. PILOT: Efficient Planning by Imitation Learning and Optimisation for Safe Autonomous Driving[C/OL]//2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2021: 1442-1449. [7] ZHANG Y, ZHANG J, ZHANG J, et al. A Novel Learning Framework for Sampling-Based Motion Planning in Autonomous Driving[J/OL]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(01): 1202-1209. [8] WALLACE R, STENTZ A, THORPE C, et al. First Results in Robot Road-Following.[M]. 1985: 1095. [9] OMEAD AMIDI, CHUCK E. THORPE. Integrated mobile robot control[C/OL]//Proc.SPIE: vol 1388. 1991. [10] ARTURO L. RANKIN, CARL D. CRANE III, DAVID G. ARMSTRONG II, et al. Autonomous path-planning navigation system for site characterization[C/OL]//Proc.SPIE: vol 2738. 1996. [11] WIT J, CRANE III C D, ARMSTRONG D. Autonomous ground vehicle path tracking[J/OL]. Journal of Robotic Systems, 2004, 21(8): 439-449. [12] SAMSON C. Control of chained systems application to path following and time-varying point-stabilization of mobile robots[J/OL]. IEEE Transactions on Automatic Control, 1995, 40(1): 64-77. [13] BUEHLER M, IAGNEMMA K, SINGH S. The 2005 DARPA Grand Challenge: The Great Robot Race[M]. Springer, 2007. [14] BUEHLER M, IAGNEMMA K, SINGH S. The DARPA Urban Challenge: Autonomous Vehicles in City Traffic[M]. Springer, 2009. [15] KANAYAMA Y, KIMURA Y, MIYAZAKI F, et al. A stable tracking control method for an autonomous mobile robot[C/OL]//, IEEE International Conference on Robotics and Automation Proceedings. 1990: 384-389 vol 1. [16]B. d'Adréa-Novel. Control of Nonholonomic Wheeled Mobile Robots by State Feedback Linearization[J]. The International Journal of Robotics Research,1995,14(6): 543-559 [17] Jeon C W, et al. Fuzzy logic-based steering controller for an autonomous head - Feed combine harvester[C]. Spokane, WA, United states: American Society of Agricultural and Biological Engineers, 2017. [18] BOUNINI F, GINGRAS D, LAPOINTE V, et al. Autonomous Vehicle and Real Time Road Lanes Detection and Tracking[C/OL]//2015 IEEE Vehicle Power and Propulsion Conference (VPPC). 2015: 1-6. [19] WANG H, MAN Z, KONG H, et al. Design and Implementation of Adaptive Terminal Sliding-Mode Control on a Steer-by-Wire Equipped Road Vehicle[J/OL]. IEEE Transactions on Industrial Electronics, 2016, 63(9): 5774-5785. [20] MASHADI B, MAJIDI M. Integrated AFS/DYC sliding mode controller for a hybrid electric vehicle[J/OL]. International Journal of Vehicle Design, 2011, 56(1-4): 246-269. [21] LI H min, WANG X bo, SONG S bin, et al. Vehicle Control Strategies Analysis Based on PID and Fuzzy Logic Control[J/OL]. Procedia Engineering, 2016, 137: 234-243. [22] GARCÍA C E, PRETT D M, MORARI M. Model predictive control: Theory and practice—A survey[J/OL]. Automatica, 1989, 25(3): 335-348. [23] CAMACHO E F, ALBA C B. Model Predictive Control[M]. Springer Science & Business Media, 2013. [24] MAYNE D Q, RAWLINGS J B, RAO C V, et al. Constrained model predictive control: Stability and optimality[J/OL]. Automatica, 2000, 36(6): 789-814. [25] FALCONE P, BORRELLI F, ASGARI J, et al. Predictive Active Steering Control for Autonomous Vehicle Systems[J/OL]. IEEE Transactions on Control Systems Technology, 2007, 15(3): 566-580. [26] FALCONE P, TUFO M, BORRELLI F, et al. A linear time varying model predictive control approach to the integrated vehicle dynamics control problem in autonomous systems[C/OL]//2007 46th IEEE Conference on Decision and Control. 2007: 2980-2985. [27] FALCONE P, BORRELLI F, TSENG H E, et al. Linear time-varying model predictive control and its application to active steering systems: Stability analysis and experimental validation[J/OL]. International Journal of Robust and Nonlinear Control, 2008, 18(8): 862-875. [28] KIM E, KIM J, SUNWOO M. Model predictive control strategy for smooth path tracking of autonomous vehicles with steering actuator dynamics[J/OL]. International Journal of Automotive Technology, 2014, 15(7): 1155-1164. [29] KIRK D E. Optimal Control Theory: An Introduction[M]. Courier Corporation, 2004. [30] RAFFO G V, GOMES G K, NORMEY-RICO J E, et al. A Predictive Controller for Autonomous Vehicle Path Tracking[J/OL]. IEEE Transactions on Intelligent Transportation Systems, 2009, 10(1): 92-102. [31] YOON Y, SHIN J, KIM H J, et al. Model-predictive active steering and obstacle avoidance for autonomous ground vehicles[J/OL]. Control Engineering Practice, 2009, 17(7): 741-750. [32] POMERLEAU D A. ALVINN: An Autonomous Land Vehicle in a Neural Network[C/OL]//Advances in Neural Information Processing Systems: vol 1. Morgan-Kaufmann, 1988 [33] POMERLEAU D A. Neural Network Vision for Robot Driving[C]//The Handbook of Brain Theory and Neural Networks. University Press, 1996: 161-181. [34] BOJARSKI M, DEL TESTA D, DWORAKOWSKI D, et al. End to End Learning for Self-Driving Cars[M/OL]. arXiv, 2016 [35] MULLER U, BEN J, COSATTO E, et al. Off-Road Obstacle Avoidance through End-to-End Learning[C/OL]//Advances in Neural Information Processing Systems: vol 18. MIT Press, 2005 [36] RAUSCH V, HANSEN A, SOLOWJOW E, et al. Learning a deep neural net policy for end-to-end control of autonomous vehicles[C/OL]//2017 American Control Conference (ACC). 2017: 4914-4919. [37] ERAQI H M, MOUSTAFA M N, HONER J. End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies[M/OL]. arXiv, 2017 [38] YU G, SETHI I K. Road-following with continuous learning[C/OL]//Proceedings of the Intelligent Vehicles ’95. Symposium. 1995: 412-417. [39] KABZAN J, HEWING L, LINIGER A, et al. Learning-Based Model Predictive Control for Autonomous Racing[J/OL]. IEEE Robotics and Automation Letters, 2019, 4(4): 3363-3370. [40] REN L, XI Z. Bias-Learning-Based Model Predictive Controller Design for Reliable Path Tracking of Autonomous Vehicles Under Model and Environmental Uncertainty[J/OL]. Journal of Mechanical Design, 2022, 144(9) [41] KUUTTI S, BOWDEN R, JOSHI H, et al. End-to-end Reinforcement Learning for Autonomous Longitudinal Control Using Advantage Actor Critic with Temporal Context[C/OL]//2019 IEEE Intelligent Transportation Systems Conference (ITSC). 2019: 2456-2462. [42] LI X, LIU C, CHEN B, et al. Robust Adaptive Learning-Based Path Tracking Control of Autonomous Vehicles Under Uncertain Driving Environments[J/OL]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23(11): 20798-20809. [43] ARADI S. Survey of Deep Reinforcement Learning for Motion Planning of Autonomous Vehicles[J/OL]. IEEE Transactions on Intelligent Transportation Systems, 2020: 1-20. [44] MORELLAS V, MORRIS T, ALEXANDER L, et al. Preview based control of a tractor trailer using DGPS for preventing road departure accidents[C/OL]//Proceedings of Conference on Intelligent Transportation Systems. 1997: 797-805. [45] TAYLOR C J, KOŠECKÁ J, BLASI R, et al. A Comparative Study of Vision-Based Lateral Control Strategies for Autonomous Highway Driving[J/OL]. The International Journal of Robotics Research, 1999, 18(5): 442-453. [46] THORPE C, HEBERT M H, KANADE T, et al. Vision and navigation for the Carnegie-Mellon Navlab[J/OL]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1988, 10(3): 362-373. [47] GULDNER J, TAN H S, PATWARDHAN S. Analysis of Automatic Steering Control for Highway Vehicles with Look-down Lateral Reference Systems[J/OL]. Vehicle System Dynamics, 1996, 26(4): 243-269. [48] WANG D, QI F. Trajectory planning for a four-wheel-steering vehicle[C/OL]//Proceedings 2001 ICRA. IEEE International Conference on Robotics and Automation (Cat. No.01CH37164): vol 4. 2001: 3320-3325 vol 4. [49] RAJAMANI R. Vehicle Dynamics and Control[M]. Springer Science & Business Media, 2011. [50] XIONG X, MIN H, YU Y, et al. Application improvement of A* algorithm in intelligent vehicle trajectory planning[J/OL]. Mathematical Biosciences and Engineering, 2021, 18(1): 1-21. [51] RANKIN A L, III C D C, II D G A. Evaluating a PID, pure pursuit, and weighted steering controller for an autonomous land vehicle[C/OL]//Mobile Robots XII: vol 3210. SPIE, 1998: 1-12 [52] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal Policy Optimization Algorithms[J/OL]. arXiv:1707.06347 [cs], 2017 [53] SUTTON R S, BARTO A G. Reinforcement Learning, second edition: An Introduction[M]. MIT Press, 2018. [54] OpenAI. Proximal Policy Optimization — Spinning Up documentation[EB/OL].2020.https://spinningup.openai.com/en/latest/algorithms/ppo.html. [55] SCHULMAN J, MORITZ P, LEVINE S, et al. High-Dimensional Continuous Control Using Generalized Advantage Estimation[M/OL]. arXiv, 2018 [56] KINGMA D P, BA J. Adam: A Method for Stochastic Optimization[M/OL]. arXiv, 2017 [57] TUCKER G, BHUPATIRAJU S, GU S, et al. The Mirage of Action-Dependent Baselines in Reinforcement Learning[C/OL]//Proceedings of the 35th International Conference on Machine Learning. PMLR, 2018: 5015-5024 [58] DOSOVITSKIY A, ROS G, CODEVILLA F, et al. CARLA: An Open Urban Driving Simulator[C/OL]//Proceedings of the 1st Annual Conference on Robot Learning. PMLR, 2017: 1-16 [59] CAO Y, REN W. Optimal Linear-Consensus Algorithms: An LQR Perspective[J/OL]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2010, 40(3): 819-830. [60] HOFFMANN G M, TOMLIN C J, MONTEMERLO M, et al. Autonomous Automobile Trajectory Tracking for Off-Road Driving: Controller Design, Experimental Validation and Racing[C/OL]//2007 American Control Conference. 2007: 2296-2301.
所在学位评定分委会	力学
国内图书分类号	TP272
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/545024
专题	工学院_力学与航空航天工程系
推荐引用方式 GB/T 7714	孙斯嘉. 基于深度强化学习的自动驾驶车辆横向控制算法研究[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12032389-孙斯嘉-力学与航空航天（4018KB）	--	--	限制开放	--	请求全文