南方科技大学知识苑(SUSTech KC): 面向金融云的多目标强化学习负载均衡算法

题名	面向金融云的多目标强化学习负载均衡算法
其他题名	MULTI-OBJECTIVE REINFORCEMENT LEARNING LOAD BALANCING ALGORITHM FOR FINANCIAL CLOUD
姓名	张烙铭
姓名拼音	ZHANG Laoming
学号	12032505
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	杨鹏
导师单位	统计与数据科学系
论文答辩日期	2023-05-13
论文提交日期	2023-06-22
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	金融领域相关的云计算信息服务系统中存在用户数量波动大以及用户对稳定性要求高的特点。本文针对金融云中由于传统负载均衡策略对用户请求的不合理分配而导致服务器存在空闲时间以及资源浪费的问题，研究了多种基于强化学习的多目标负载均衡算法，旨在保证用户请求不发生断连的情况下实现服务器资源的负载均衡以及空闲时长的最小化。本文对问题场景进行了建模，具体分析了金融云场景下用户连接时长对于服务器空闲时长的影响，并定义了负载均衡和空闲时长的优化目标函数。分别提出三种强化学习算法来解决该问题：PPO-LB、IPG-LB、以及MERL-LB，其中，基于近端策略梯度优化算法设计的PPO-LB方法能够同时优化两个目标，并实现负载均衡目标较好的同时相比启发式算法缩短20-30%的空闲时长；基于独立输入策略梯度优化算法设计的IPG-LB改进了PPO-LB的价值评估方式并提高了策略训练稳定性和收敛速度；MERL-LB基于演化强化学习算法来优化神经网络参数从而能够同时产生一组具有多样性的策略，实验证明MERL-LB能够在目标性能以及策略多样性上表现出比其它算法更好的能力。
其他摘要	The financial cloud computing information service system is characterized by a large number of fluctuating users and high user requirements for stability. To address the problem of idle time and wasted resources in financial clouds due to the unreasonable allocation of user requests by traditional load balancing strategies, this thesis investigates various multi-objective load balancing algorithms based on reinforcement learning, aiming to maximize the resource load balance and minimize the idle time of the server while ensuring that user requests are not disconnected. This paper firstly models the problem scenario, specifically analyses the impact of user connection length on server idle time in the financial cloud scenario, and defines the optimization objective functions for load balancing and idle time. Three reinforcement learning algorithms are proposed to solve the problem: PPO-LB, IPG-LB, and MERL-LB. The PPO-LB method based on the Proximal Policy Gradient algorithm can optimize both objectives and achieve the load balancing objective while reducing the idle time by 20-30% compared to the heuristic algorithm. The IPG-LB based on the input-independent policy gradient algorithm improves the evaluation of the value function of PPO-LB and increases training stability and convergence speed. The MERL-LB uses an evolutionary algorithm to optimize the neural network parameters so that a diverse set of policies can be generated simultaneously, MERL-LB is shown to outperform other algorithms in terms of performance and diversity across different objectives.
关键词	金融云负载均衡深度强化学习演化强化学习多目标优化
语种	中文
培养类别	独立培养
入学年份	2020
学位授予年份	2023-06
参考文献列表	[1] ZHANG Q, CHENG L, BOUTABA R. Cloud computing: State-of-the-art and research chal-lenges[J]. Journal of Internet Services and Applications, 2010, 1: 7-18. [2] NAZÁRIO R T F, E SILVA J L, SOBREIRO V A, et al. A literature review of technical analysis on stock markets[J]. The Quarterly Review of Economics and Finance, 2017, 66: 115-126. [3] NOSHY M, IBRAHIM A, ALI H A. Optimization of live virtual machine migration in cloud computing: A survey and future directions[J]. Journal of Network and Computer Applications, 2018, 110: 1-10. [4] ZHANG F, LIU G, FU X, et al. A survey on virtual machine migration: Challenges, techniques, and open issues[J]. IEEE Communications Surveys & Tutorials, 2018, 20(2): 1206-1243. [5] YANG H, RYU D, RYU D. Investor sentiment, asset returns and firm characteristics: Evidence from the Korean stock market[J]. Investment Analysts Journal, 2017, 46(2): 132-147. [6] ARULKUMARAN K, DEISENROTH M P, BRUNDAGE M, et al. Deep reinforcement learn-ing: A brief survey[J]. IEEE Signal Processing Magazine, 2017, 34(6): 26-38. [7] GUCK J W, VAN BEMTEN A, REISSLEIN M, et al. Unicast QoS routing algorithms for SDN: A comprehensive survey and performance evaluation[J]. IEEE Communications Surveys & Tutorials, 2017, 20(1): 388-415. [8] SHAFIQ D A, JHANJHI N, ABDULLAH A. Load balancing techniques in cloud comput-ing environment: A review[J]. Journal of King Saud University Computer and Information Sciences, 2022, 34(7): 3910-3933. [9] KAURAV N S, YADAV P. A genetic algorithm based load balancing approach for resource optimization for cloud computing environment[J]. International Journal of Computer Science and Information Technologies, 2019, 6(3): 175-184. [10] MOLY M, HOSSAIN A, LECTURER S, et al. Load balancing approach and algorithm in cloud computing environment[J]. American journal of Engineering Research, 2019, 8(4): 99-105. [11] JYOTI A, SHRIMALI M, MISHRA R. Cloud computing and load balancing in cloud computing-survey[C]//2019 9th International Conference on Cloud Computing, Data Science & Engineering (Confluence). IEEE, 2019: 51-55. [12] BABU K, JOY A. Samuel (2017) Load balancing of tasks using hybrid technique with analyt-ical method of esce & throttled algorithm[J]. International Journal of Innovative Research & Development, 2(6): 61-66. [13] PHI N X, TIN C T, THU L N K, et al. Proposed load balancing algorithm to reduce response time and processing time on cloud computing[J]. The International Journal of Computer Networks & Communications, 2018, 10(3): 87-98. [14] FALISHA I N, PURBOYO T W, LATUCONSINA R, et al. Experimental model for load balanc-ing in cloud computing using equally spread current execution load algorithm[J]. International Journal of Applied Engineering Research, 2018, 13(2): 1134-1138. [33] BOUCHERIE R J, VAN DIJK N M. Markov decision processes in practice: volume 248[M]. Springer, 2017. [34] MAO H, VENKATAKRISHNAN S B, SCHWARZKOPF M, et al. Variance reduction for re-inforcement learning in input-driven environments[A]. 2018. [35] REZA M F, ZHAO B. Deep reinforcement learning with different rewards for scheduling in high-performance computing systems[C]//2021 IEEE International Midwest Symposium on Circuits and Systems. IEEE, 2021: 183-186. [36] MONDAL S S, SHEORAN N, MITRA S. Scheduling of time-varying workloads using re-inforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence: vol-ume 35. 2021: 9000-9008. [37] TONG Z, DENG X, CHEN H, et al. DDMTS: A novel dynamic load balancing scheduling scheme under SLA constraints in cloud computing[J]. Journal of Parallel and Distributed Com-puting, 2021, 149: 138-148. [38] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning [A]. 2013. [39] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. [40] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning [C]//Proceedings of the AAAI Conference on Artificial Intelligence: volume 30. 2016. [41] DABNEY W, ROWLAND M, BELLEMARE M, et al. Distributional reinforcement learning with quantile regression[C]//Proceedings of the AAAI Conference on Artificial Intelligence: volume 32. 2018. [42] HESSEL M, MODAYIL J, VAN HASSELT H, et al. Rainbow: Combining improvements in deep reinforcement learning[C]//Proceedings of the AAAI Conference on Artificial Intelli-gence: volume 32. 2018. [43] DUAN Y, CHEN X, HOUTHOOFT R, et al. Benchmarking deep reinforcement learning for continuous control[C]//International Conference on Machine Learning. PMLR, 2016: 1329-1338. [44] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning [C]//International Conference on Machine Learning. PMLR, 2016: 1928-1937. [45] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//International Conference on Machine Learning. PMLR, 2015: 1889-1897. [46] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms [A]. 2017. [47] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[A]. 2015. [48] FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//International Conference on Machine Learning. PMLR, 2018: 1587-1596. [49] HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applica-tions[A]. 2018. [50] JANG B, KIM M, HARERIMANA G, et al. Q-learning algorithms: A comprehensive classifi-cation and applications[J]. IEEE Access, 2019, 7: 133653-133667. [51] SALIMANS T, HO J, CHEN X, et al. Evolution strategies as a scalable alternative to reinforce-ment learning[A]. 2017. [52] CONTI E, MADHAVAN V, PETROSKI SUCH F, et al. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[J]. Ad-vances in Neural Information Processing Systems, 2018, 31. [53] CHOROMANSKI K, ROWLAND M, SINDHWANI V, et al. Structured evolution with com-pact architectures for scalable policy optimization[C]//International Conference on Machine Learning. PMLR, 2018: 970-978. [54] CHOROMANSKI K, PACCHIANO A, PARKER-HOLDER J, et al. When random search is not enough: Sample-efficient and noise-robust blackbox optimization of RL policies[A]. 2019. [55] LEHMAN J, STANLEY K O, et al. Exploiting open-endedness to solve problems through the search for novelty.[C]//Alife. 2008: 329-336. [56] LEHMAN J, STANLEY K O. Abandoning objectives: Evolution through the search for novelty alone[J]. Evolutionary Computation, 2011, 19(2): 189-223. [57] PUGH J K, SOROS L B, STANLEY K O. Quality diversity: A new frontier for evolutionary computation[J]. Frontiers in Robotics and AI, 2016: 40. [58] GANGWANI T, PENG J. Policy optimization by genetic distillation[A]. 2017. [59] GUNANTARA N. A review of multi-objective optimization: Methods and its applications[J]. Cogent Engineering, 2018, 5(1): 1502242. [60] DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182-197. [61] LARRAÍN S, PRADENAS L, PULKKINEN I, et al. Multiobjective optimization of a con-tinuous kraft pulp digester using SPEA2[J]. Computers & Chemical Engineering, 2020, 143: 107086. [62] TANG L, WANG X. A hybrid multiobjective evolutionary algorithm for multiobjective opti-mization problems[J]. IEEE Transactions on Evolutionary Computation, 2012, 17(1): 20-45. [63] ZHANG Q, LI H. MOEA/D: A multiobjective evolutionary algorithm based on decomposition [J]. IEEE Transactions on Evolutionary Computation, 2007, 11(6): 712-731. [64] MAO H, SCHWARZKOPF M, VENKATAKRISHNAN S B, et al. Learning scheduling algo-rithms for data processing clusters[M]//Proceedings of the ACM Special Interest Group on Data Communication. 2019: 270-288. [65] TANG Y, AGRAWAL S. Discretizing continuous action space for on-policy optimization[C]//Proceedings of the AAAI Conference on Artificial Intelligence: volume 34. 2020: 5981-5988. [66] ZHANG Y, YU W, TURK G. Learning novel policies for tasks[C]//International Conference on Machine Learning. PMLR, 2019: 7483-7492.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP18
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/543881
专题	工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	张烙铭. 面向金融云的多目标强化学习负载均衡算法[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12032505-张烙铭-计算机科学与工（3731KB）	学位论文	--	限制开放	CC BY-NC-SA	请求全文