南方科技大学知识苑(SUSTech KC): 面向组合优化问题的深度强化学习算法研究

题名	面向组合优化问题的深度强化学习算法研究
其他题名	Combinatorial Optimization Problem with Deep Reinforcement Learning
姓名	姚舜禹
姓名拼音	YAO Shunyu
学号	12032920
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	王振坤
导师单位	系统设计与智能制造学院
论文答辩日期	2023-05-18
论文提交日期	2023-06-29
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	组合优化问题是一类优化问题的统称，由于其与众多实际应用场景紧密相关，所以一直以来都是计算机科学和运筹学的重要研究领域。旅行商问题是其中的代表性问题。该问题要求旅行商经过所有节点并返回起点，且路径成本最低，和物流运输及交通规划紧密相关，在过去一直被广泛研究。但由于该问题是NP-hard的，所以无法在多项式时间内获得精确最优解。本项工作提出三种深度学习方法，高效地生成旅行商问题的高质量近似解。首先，本文提出了一种基于数据效率的深度学习方法，旨在采用数据增强技术，获取充足的训练数据。并提出一种新的双向损失函数，利用旅行商问题同时具有多个等价最优解的性质，提升学习效率。这种新的方法可以有效解决深度强化学习中数据效率低和奖励稀疏的问题。结果表明，该方法可以有效提升模型的训练效率，并在不同规模的问题实例上以极低的计算成本获得接近最优解的结果。此外，考虑到实际的生产生活中的路径规划问题往往要同时考虑多个目标，如距离，时间，费用等。自然地，旅行商问题也可以扩展为多目标旅行商问题。由于多个目标函数往往彼此冲突，没有一个解可以同时在所有目标上达到最优。因此求解多目标旅行商问题需要生成一组帕累托最优解来平衡各个目标。对此，本项工作提出两种深度强化学习方法。本文提出了一种深度强化学习方法，通过单个模型，端到端求解多目标旅行商问题。这种方法提出一种新的路由编码器，将目标信息和全局信息根据权重向量进行自适应混合。在此基础上提出上下文编码和Top-k基线。这种方法可以并行的对该轻量的深度强化学习模型进行低成本训练，而训练好的模型可以快速生成一组高质量的近似帕累托最优解，并且在未见得更大规模的问题实例上依然有效。本文还提出了一种基于深度强化学习的单模型学习改进的多目标旅行商问题求解方法，旨在创新性地结合深度强化学习和启发式算子求解多目标旅行商问题。结果表明，深度强化学习和启发式算子结合可以有效求解多目标旅行商问题，是一种新的思路。本文提出上述三种深度学习算法以高效生成旅行商问题的高质量近似解，证明了采用深度学习技术求解组合优化问题的巨大潜力。
其他摘要	Combinatorial optimization is a class of optimization problems. Because it is closely related to many real-world scenarios, it has always been an important research field in computer science and operations research. As a representative combinatorial optimization problem, the traveling salesman problem requires the salesman to pass through all nodes and return to the starting node with the lowest cost. This is closely related to logistics transportation and traffic planning. It has been widely studied in past decades. However, as the problem is NP-hard, the exact optimal solution cannot be obtained in polynomial time. First of all, this thesis proposes a data-efficient deep learning method, which uses data augmentation to obtain sufficient training data. And a novel bidirectional loss is proposed to improve the training efficiency by taking advantage of the property that a traveling salesman problem has multiple equivalent optimal solutions at the same time. This new method can effectively alleviate the issues of low data efficiency and sparse rewards in deep reinforcement learning. The results show that this method can improve the training efficiency and cheaply obtain near-optimal solutions of different size instances. In addition, real-world path planning problems always consider many objectives at the same time, such as distance, time, and cost. Naturally, the traveling salesman problem can be extended to a multi-objective traveling salesman problem. Since the objective functions often conflict, no single solution can be optimal among all objectives simultaneously. Instead, there is a set of Pareto-optimal solutions that trade-offs between different objectives. This thesis proposes a deep reinforcement learning method for end-to-end solving the multi-objective traveling salesman problem with a single model. This method proposes a novel routing encoder that adaptively aggregates the objective information and the global information with the weight vectors. The modified context encoding and Top-k baselines are proposed. This method can train the lightweight deep reinforcement learning model in parallel. And the trained model can generate a set of approximate Pareto optimal solutions in real-time, and it still works well on unseen larger-scale instances. This thesis also proposes a learning improvement method with a deep reinforcement learning model. This method innovatively combines deep reinforcement learning and heuristic operators. The results show that the combination can effectively solve the multi-objective traveling salesman problem, which is a new direction. This thesis proposes the above three deep learning algorithms to efficiently generate high-quality approximate solutions to the traveling salesman problem, demonstrating the great potential of using deep learning techniques to solve combinatorial optimization problems.
关键词	组合优化深度学习强化学习多目标优化
其他关键词	Combinatorial Optimization Deep Learning Reinforcement Learning Multi-objective Optimization
语种	中文
培养类别	独立培养
入学年份	2020
学位授予年份	2023-06
参考文献列表	[1] BERTSEKAS D P. Dynamic programming and optimal control 3rd edition, volume II[J]. Bel mont, MA: Athena Scientific, 2011. [2] PAPADIMITRIOU C H, STEIGLITZ K. Combinatorial optimization: algorithms and com plexity[M]. Courier Corporation, 1998. [3] HALIM A H, ISMAIL I. Combinatorial optimization: comparison of heuristic algorithms intravelling salesman problem[J]. Archives of Computational Methods in Engineering, 2019, 26(2): 367-380. [4] VAN LAARHOVEN P J, AARTS E H, LENSTRA J K. Job shop scheduling by simulatedannealing[J]. Operations research, 1992, 40(1): 113-125. [5] HANSEN M P. Use of substitute scalarizing functions to guide a local search based heuristic:The case of moTSP[J]. Journal of heuristics, 2000, 6(3): 419-431. [6] VINYALS O, FORTUNATO M, JAITLY N. Pointer networks[J]. Advances in neural informa tion processing systems, 2015, 28. [7] SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks[J]. Advances in neural information processing systems, 2014, 27. [8] BELLO I, PHAM H, LE Q V, et al. Neural combinatorial optimization with reinforcementlearning[A]. 2016. [9] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//International conference on machine learning. PMLR, 2016: 1928-1937. [10] NAZARI M, OROOJLOOY A, SNYDER L, et al. Reinforcement learning for solving the vehiclerouting problem[J]. Advances in neural information processing systems, 2018, 31. [11] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[J]. Advances inneural information processing systems, 2017, 30. [12] DEUDON M, COURNUT P, LACOSTE A, et al. Learning heuristics for the TSP by policygradient[C]//International conference on the integration of constraint programming, artificialintelligence, and operations research. Springer, 2018: 170-181. [13] CROES G A. A method for solving traveling-salesman problems[J]. Operations research, 1958,6(6): 791-812. [14] KOOL W, VAN HOOF H, WELLING M. Attention, learn to solve routing problems![A]. 2018. [15] IOFFE S, SZEGEDY C. Batch normalization: Accelerating deep network training by reducinginternal covariate shift[C]//International conference on machine learning. PMLR, 2015: 448-456. [16] BA J L, KIROS J R, HINTON G E. Layer normalization[A]. 2016. [17] XIN L, SONG W, CAO Z, et al. Step-wise deep learning models for solving routing problems[J]. IEEE Transactions on Industrial Informatics, 2020, 17(7): 4861-4871. [18] XIN L, SONG W, CAO Z, et al. Multi-decoder attention model with embedding glimpse forsolving vehicle routing problems[C]//Proceedings of the AAAI Conference on Artificial Intel ligence: volume 35. 2021: 12042-12049. [19] LI K, ZHANG T, WANG R, et al. Deep reinforcement learning for combinatorial optimization:Covering salesman problems[J]. IEEE Transactions on Cybernetics, in press, 2021. [20] KWON Y D, CHOO J, KIM B, et al. POMO: Policy optimization with multiple optima for rein forcement learning[J]. Advances in Neural Information Processing Systems, 2020, 33: 21188-21198. [21] KIM M, PARK J, PARK J. Sym-NCO: Leveraging Symmetricity for Neural CombinatorialOptimization[C]//Advances in Neural Information Processing Systems (NeurIPS). 2022. [22] HELSGAUN K. An extension of the Lin-Kernighan-Helsgaun TSP solver for constrained trav eling salesman and vehicle routing problems[J]. Roskilde: Roskilde University, 2017: 24-50. [23] CHEN X, TIAN Y. Learning to perform local rewriting for combinatorial optimization[J].Advances in Neural Information Processing Systems, 2019, 32. [24] PERRON L, FURNON V. OR-Tools[CP/OL]. Google(2019-7-19). https://developers.google.com/optimization/. [25] LU H, ZHANG X, YANG S. A learning-based iterative method for solving vehicle routingproblems[C]//International Conference on Learning Representations. 2019. [26] WU Y, SONG W, CAO Z, et al. Learning improvement heuristics for solving routing problems..[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021. [27] D O COSTA P R, RHUGGENAATH J, ZHANG Y, et al. Learning 2-opt heuristics for thetraveling salesman problem via deep reinforcement learning[C]//Asian Conference on MachineLearning. PMLR, 2020: 465-480. [28] MA Y, LI J, CAO Z, et al. Learning to iteratively solve routing problems with dual-aspectcollaborative transformer[J]. Advances in Neural Information Processing Systems, 2021, 34:11096-11107. [29] LI K, ZHANG T, WANG R. Deep reinforcement learning for multiobjective optimization[J].IEEE Transactions on Cybernetics, 2020, 51(6): 3103-3114. [30] ZHANG Q, LI H. MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decompo sition[J/OL]. IEEE Transactions on Evolutionary Computation, 2007, 11(6): 712-731. DOI:10.1109/TEVC.2007.892759. [31] DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm:NSGA-II[J/OL]. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197. DOI:10.1109/4235.996017. [32] WU H, WANG J, ZHANG Z. MODRL/D-AM: Multiobjective Deep Reinforcement Learn ing Algorithm Using Decomposition and Attention Model for Multiobjective Optimization[C]//International Symposium on Intelligence Computation and Applications. Springer, 2019: 575-589. [33] ZHANG Y, WANG J, ZHANG Z, et al. MODRL/D-EL: Multiobjective Deep ReinforcementLearning with Evolutionary Learning for Multiobjective Optimization[C]//2021 InternationalJoint Conference on Neural Networks (IJCNN). IEEE, 2021: 1-8. [34] SHAO Y, LIN J C W, SRIVASTAVA G, et al. Multi-Objective Neural Evolutionary Algo rithm for Combinatorial Optimization Problems[J]. IEEE Transactions on Neural Networksand Learning Systems, 2021. [35] ZHANG Z, WU Z, ZHANG H, et al. Meta-Learning-Based Deep Reinforcement Learning forMultiobjective Optimization Problems[J]. IEEE Transactions on Neural Networks and LearningSystems, in press, 2022. [36] SUTTON R S, BARTO A G. Reinforcement learning: An introduction[M]. MIT press, 2018. [37] WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforce ment learning[J]. Machine learning, 1992, 8(3): 229-256. [38] SCHULMAN J, MORITZ P, LEVINE S, et al. High-dimensional continuous control usinggeneralized advantage estimation[A]. 2015. [39] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedingsof the IEEE conference on computer vision and pattern recognition. 2016: 770-778. [40] NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]//International Conference on Machine Learning. 2010. [41] MIETTINEN K. Nonlinear multiobjective optimization: volume 12[M]. Springer Science &Business Media, 2012. [42] MA X, YU Y, LI X, et al. A Survey of Weight Vector Adjustment Methods for Decomposition Based Multiobjective Evolutionary Algorithms[J/OL]. IEEE Transactions on EvolutionaryComputation, 2020, 24(4): 634-649. DOI: 10.1109/TEVC.2020.2978158. [43] WANG Z, ONG Y S, SUN J, et al. A generator for multiobjective test problems with difficult to-approximate Pareto front boundaries[J]. IEEE Transactions on Evolutionary Computation,2019, 23(4): 556-571. [44] ZITZLER E, THIELE L. Multiobjective evolutionary algorithms: a comparative case study andthe strength Pareto approach[J]. IEEE transactions on Evolutionary Computation, 1999, 3(4):257-271. [45] HANSEN M P, JASZKIEWICZ A. Evaluating the quality of approximations to the non dominated set[M]. Citeseer, 1994. [46] KORTE B H, VYGEN J, KORTE B, et al. Combinatorial optimization: volume 1[M]. Springer,2011. [47] BENGIO Y, LODI A, PROUVOST A. Machine learning for combinatorial optimization: amethodological tour d’horizon[J]. European Journal of Operational Research, 2021, 290(2):405-421. [48] VECERIK M, HESTER T, SCHOLZ J, et al. Leveraging demonstrations for deep reinforcementlearning on robotics problems with sparse rewards[A]. 2017. [49] HARE J. Dealing with sparse rewards in reinforcement learning[A]. 2019. [50] LASKIN M, LEE K, STOOKE A, et al. Reinforcement learning with augmented data[J]. Ad vances in neural information processing systems, 2020, 33: 19884-19895. [51] JOSHI C K, CAPPART Q, ROUSSEAU L M, et al. Learning TSP requires rethinking general ization[A]. 2020. [52] JOSHI C K, LAURENT T, BRESSON X. An efficient graph convolutional network techniquefor the travelling salesman problem[A]. 2019. [53] MILAN A, REZATOFIGHI S H, GARG R, et al. Data-driven approximations to NP-hard prob lems[C]//Thirty-first AAAI conference on artificial intelligence. 2017. [54] FU Z H, QIU K B, ZHA H. Generalize a Small Pre-trained Model to Arbitrarily Large TSPInstances[C]//AAAI conference on artificial intelligence. 2021. [55] HUDSON B, LI Q, MALENCIA M, et al. Graph Neural Network Guided Local Search for theTraveling Salesperson Problem[A]. 2021. [56] KOOL W, VAN HOOF H, GROMICHO J, et al. Deep Policy Dynamic Programming for VehicleRouting Problems[A]. 2021. [57] KHALIL E, DAI H, ZHANG Y, et al. Learning combinatorial optimization algorithms overgraphs[J]. Advances in Neural Information Processing Systems, 2017, 30. [58] MA Q, GE S, HE D, et al. Combinatorial optimization by graph pointer networks and hierar chical reinforcement learning[A]. 2019. [59] KIM M, PARK J, et al. Learning collaborative policies to solve NP-hard routing problems[C]//Advances in Neural Information Processing Systems (NeurIPS). 2021. [60] SHORTEN C, KHOSHGOFTAAR T M. A survey on image data augmentation for deep learning[J]. Journal of big data, 2019, 6(1): 1-48. [61] FENG S Y, GANGAL V, WEI J, et al. A survey of data augmentation approaches for NLP[A].2021. [62] GEISLER S, SOMMER J, SCHUCHARDT J, et al. Generalization of Neural CombinatorialSolvers Through the Lens of Adversarial Robustness[C]//International Conference on LearningRepresentations (ICLR). 2022. [63] THOMAS N, SMIDT T, KEARNES S, et al. Tensor field networks: Rotation-and translation equivariant neural networks for 3d point clouds[A]. 2018. [64] SATORRAS V G, HOOGEBOOM E, WELLING M. E (n) equivariant graph neural networks[C]//International Conference on Machine Learning (ICML). 2021: 9323-9332. [65] APPLEGATE D, BIXBY R, CHVATAL V, et al. Concorde TSP solver[Z]. 2006. [66] KOTARY J, FIORETTO F, VAN HENTENRYCK P. Learning hard optimization problems: Adata generation perspective[J]. Advances in Neural Information Processing Systems, 2021, 34:24981-24992. [67] REINELT G. TSPLIB—A traveling salesman problem library[J]. ORSA journal on computing,1991, 3(4): 376-384. [68] Gurobi Optimization, LLC. Gurobi Optimizer Reference Manual[EB/OL]. 2022. https://www.gurobi.com. [69] HOTTUNG A, BHANDARI B, TIERNEY K. Learning a latent search space for routing prob lems using variational autoencoders[C]//International Conference on Learning Representations.2020. [70] MANCHANDA S, MITTAL A, DHAWAN A, et al. Learning heuristics over large graphs viadeep reinforcement learning[A]. 2019. [71] BELOBORODOV D, ULANOV A E, FOERSTER J N, et al. Reinforcement learning enhancedquantum-inspired algorithm for combinatorial optimization[J]. Machine Learning: Science andTechnology, 2020, 2(2): 025009. [72] FENG L, HUANG Y, ZHOU L, et al. Explicit evolutionary multitasking for combinatorialoptimization: A case study on capacitated vehicle routing problem[J]. IEEE Transactions onCybernetics, 2020, 51(6): 3143-3156. [73] XU Y, FANG M, CHEN L, et al. Reinforcement Learning With Multiple Relational Attentionfor Solving Vehicle Routing Problems[J]. IEEE Transactions on Cybernetics, 2021. [74] WANG H, JIN Y. A random forest-assisted evolutionary algorithm for data-driven constrainedmultiobjective combinatorial optimization of trauma systems[J]. IEEE Transactions on Cyber netics, 2018, 50(2): 536-549. [75] CAI X, XIA C, ZHANG Q, et al. The collaborative local search based on dynamic-constraineddecomposition with grids for combinatorial multiobjective optimization[J]. IEEE Transactionson Cybernetics, 2019, 51(5): 2639-2650. [76] WANG Z, ZHEN H L, DENG J, et al. Multiobjective optimization-aided decision-makingsystem for large-scale manufacturing planning[J]. IEEE Transactions on Cybernetics, 2022, 52(8): 8326-8339. [77] HICHAM E H, SAID B, JAMAL B. Multi-objective optimization using genetic algorithms inMOTSP (CO2 Emissions)[J]. International Journal of Artificial Intelligence and Application(IJAIA), 2015, 6(5): 35-50. [78] TEZCANER D, KÖKSALAN M. An interactive algorithm for multi-objective route planning[J]. Journal of optimization theory and applications, 2011, 150(2): 379-394. [79] GARCÍA-MARTÍNEZ C, CORDÓN O, HERRERA F. A taxonomy and an empirical analysisof multiple objective ant colony optimization algorithms for the bi-criteria TSP[J]. EuropeanJournal of Operational Research, 2007, 180(1): 116-148. [80] PAQUETE L, CHIARANDINI M, STÜTZLE T. Pareto local optimum sets in the biobjec tive traveling salesman problem: An experimental study[M]//Metaheuristics for MultiobjectiveOptimisation. Springer, 2004: 177-199. [81] PAQUETE L, STÜTZLE T. A two-phase local search for the biobjective traveling salesmanproblem[C]//International Conference on Evolutionary Multi-Criterion Optimization. Springer,2003: 479-493. [82] LUST T, TEGHEM J. The multiobjective traveling salesman problem: a survey and a newapproach[M]//Advances in Multi-Objective Nature Inspired Computing. Springer, 2010: 119-141. [83] JASZKIEWICZ A. Genetic local search for multi-objective combinatorial optimization[J]. Eu ropean journal of operational research, 2002, 137(1): 50-71. [84] KUMAR R, SINGH P. Pareto evolutionary algorithm hybridized with local search for biobjec tive TSP[M]//Hybrid Evolutionary Algorithms. Springer, 2007: 361-398. [85] JASZKIEWICZ A, ZIELNIEWICZ P. Pareto memetic algorithm with path relinking for bi objective traveling salesperson problem[J]. European Journal of Operational Research, 2009,193(3): 885-890 [86] JIANG S, YANG S. A strength Pareto evolutionary algorithm based on reference direction formultiobjective and many-objective optimization[J]. IEEE Transactions on Evolutionary Com putation, 2017, 21(3): 329-346. [87] BADER J, ZITZLER E. HypE: An algorithm for fast hypervolume-based many-objective opti mization[J]. Evolutionary computation, 2011, 19(1): 45-76. [88] HERNÁNDEZ GÓMEZ R, COELLO COELLO C A. Improved metaheuristic based on theR2 indicator for many-objective optimization[C]//Proceedings of the 2015 annual conferenceon genetic and evolutionary computation. 2015: 679-686. [89] LI K, DEB K, ZHANG Q, et al. An evolutionary many-objective optimization algorithm basedon dominance and decomposition[J]. IEEE transactions on evolutionary computation, 2014, 19(5): 694-716. [90] WANG R, ZHOU Z, ISHIBUCHI H, et al. Localized weighted sum method for many-objectiveoptimization[J]. IEEE Transactions on Evolutionary Computation, 2016, 22(1): 3-18. [91] CHENG R, JIN Y, OLHOFER M, et al. A reference vector guided evolutionary algorithm formany-objective optimization[J]. IEEE Transactions on Evolutionary Computation, 2016, 20(5):773-791. [92] CAI X, XIAO Y, LI M, et al. A grid-based inverted generational distance for multi/many objective optimization[J]. IEEE Transactions on Evolutionary Computation, 2020, 25(1): 21-34. [93] LIU H L, GU F, ZHANG Q. Decomposition of a multiobjective optimization problem into anumber of simple multiobjective subproblems[J]. IEEE transactions on evolutionary computa tion, 2013, 18(3): 450-455. [94] TIAN Y, FENG Y, WANG C, et al. A large-scale combinatorial many-objective evolutionaryalgorithm for intensity-modulated radiotherapy planning[J]. IEEE Transactions on EvolutionaryComputation, 2022, 26(6): 1511-1525. [95] AN Y, CHEN X, GAO K, et al. Multiobjective flexible job-shop rescheduling with new jobinsertion and machine preventive maintenance[J]. IEEE Transactions on Cybernetics, in press,2022. [96] HOU Y, WU Y, HAN H. Multi-Objective Differential Evolution Algorithm Balancing MultipleStakeholders for Low-Carbon Order Scheduling in E-Waste Recycling[J]. IEEE Transactionson Evolutionary Computation, in press, 2023. [97] CAI X, SUN Q, LI Z, et al. Cooperative coevolution with knowledge-based dynamic variabledecomposition for bilevel multiobjective optimization[J]. IEEE Transactions on EvolutionaryComputation, 2022, 26(6): 1553-1565. [98] WAN Y, ZHONG Y, MA A, et al. An accurate UAV 3-D path planning method for disas ter emergency response based on an improved multiobjective swarm intelligence algorithm[J].IEEE Transactions on Cybernetics, in press, 2022. [99] WANG S, MEI Y, ZHANG M. A Multi-Objective Genetic Programming Algorithm with 𝛼dominance and Archive for Uncertain Capacitated Arc Routing Problem[J]. IEEE Transactionson Evolutionary Computation, in press, 2022. [100] LIU W, WANG R, ZHANG T, et al. Hybridization of evolutionary algorithm and deep re inforcement learning for multi-objective orienteering optimization[J]. IEEE Transactions onEvolutionary Computation, in press, 2022. [101] HOU Y, WU Y, HAN H. Multistate-constrained multiobjective differential evolution algorithmwith variable neighborhood strategy[J]. IEEE Transactions on Cybernetics, in press, 2022. [102] CAI X, WANG K, MEI Y, et al. Decomposition-based Lin-Kernighan Heuristic with Neigh borhood Structure Transfer for Multi/Many-objective Traveling Salesman Problem[J]. IEEETransactions on Evolutionary Computation, in press, 2022. [103] ISHIBUCHI H, SETOGUCHI Y, MASUDA H, et al. Performance of decomposition-basedmany-objective algorithms strongly depends on Pareto front shapes[J]. IEEE Transactions onEvolutionary Computation, 2017, 21(2): 169-190. [104] ROSENBAUM C, KLINGER T, RIEMER M. Routing networks: Adaptive selection of non linear functions for multi-task learning[A]. 2017. [105] SHI J, ZHANG Q, SUN J. PPLS/D: Parallel Pareto local search based on decomposition[J].IEEE Transactions on Cybernetics, 2018, 50(3): 1060-1071. [106] DAVIS L, et al. Applying adaptive algorithms to epistatic domains.[C]//IJCAI: volume 85.Citeseer, 1985: 162-164. [107] TIAN Y, CHENG R, ZHANG X, et al. PlatEMO: A MATLAB platform for evolutionarymulti-objective optimization [educational forum][J]. IEEE Computational Intelligence Mag azine, 2017, 12(4): 73-87. [108] LI J, XIN L, CAO Z, et al. Heterogeneous attentions for solving pickup and delivery problemvia deep reinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems,2021, 23(3): 2306-2315. [109] ZHAO J, MAO M, ZHAO X, et al. A hybrid of deep reinforcement learning and local searchfor the vehicle routing problems[J]. IEEE Transactions on Intelligent Transportation Systems,2020, 22(11): 7208-7218. [110] JAMES J, YU W, GU J. Online vehicle routing with neural combinatorial optimization and deepreinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 20(10): 3806-3817. [111] SHI J, GAO Y, WANG W, et al. Operating electric vehicle fleet for ride-hailing services withreinforcement learning[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21(11): 4822-4834. [112] PAULO R D O, RHUGGENAATH J, ZHANG Y, et al. Learning 2-opt Local Search for theTraveling Salesman Problem[J]. BNAIC/BeneLearn 2020, 2020: 390. [113] EHRGOTT M, GANDIBLEUX X. Multiobjective combinatorial optimization—theory,methodology, and applications[M]//Multiple criteria optimization: State of the art annotatedbibliographic surveys. Springer, 2003: 369-444. [114] GHORAI C, SHAKHARI S, BANERJEE I. A SPEA-based multimetric routing protocol forintelligent transportation systems[J]. IEEE Transactions on Intelligent Transportation Systems,2020, 22(11): 6737-6747. [115] SARKER A, SHEN H, STANKOVIC J A. MORP: Data-driven multi-objective route plan ning and optimization for electric vehicles[J]. Proceedings of the ACM on Interactive, Mobile,Wearable and Ubiquitous Technologies, 2018, 1(4): 1-35. [116] KINGMA D P, BA J. Adam: A method for stochastic optimization[A]. 2014. [117] JASZKIEWICZ A. On the performance of multiple-objective genetic local search on the 0/1knapsack problem-a comparative experiment[J]. IEEE Transactions on Evolutionary Computa tion, 2002, 6(4): 402-412
所在学位评定分委会	电子科学与技术
国内图书分类号	TP183
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/544670
专题	工学院_系统设计与智能制造学院
推荐引用方式 GB/T 7714	姚舜禹. 面向组合优化问题的深度强化学习算法研究[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12032920-姚舜禹-系统设计与智能（2888KB）	--	--	限制开放	--	请求全文