南方科技大学知识苑(SUSTech KC): 基于强化学习的QUIC拥塞控制性能诊断和优化

题名	基于强化学习的QUIC拥塞控制性能诊断和优化
其他题名	DIAGNOSIS AND OPTIMIZATION OF QUIC CONESTION CONTROL PERFORMANCE BASED ON REINFORCEMENT LEARNING
姓名	潘知渊
姓名拼音	PAN Zhiyuan
学号	12133090
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	汪漪
导师单位	未来网络研究院
论文答辩日期	2024-05-07
论文提交日期	2024-06-24
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	网络拥塞控制是互联网传输的基础。基于深度强化学习（Deep Reinforcement Learning, DRL）的拥塞控制机制无需根据网络特征改变内部规则算法，只需根据网络反馈调整发送速率，网络适应性更强。但基于DRL的拥塞控制研究尚处于初始阶段，其性能表现不稳定，无法大规模部署。本文旨在针对基于DRL拥塞控制算法的性能问题进行优化，提高其大规模部署的可能性。通过实际测试平台分析现有基于DRL拥塞控制算法的性能表现，并发现两个性能瓶颈。瓶颈一，DRL的随机探索机制导致错误动作，严重影响拥塞控制性能。虽然可利用规则算法指导降低影响，但规则算法会限制DRL性能。瓶颈二，多用户异构网络下DRL模型收敛速度无法满足大规模部署要求，具体体现在数据集差异、网络场景不同导致模型记忆丢失。针对上述发现，设计并实现了两个性能优化机制。设计基于熵和安全学习的智能动态探索方案Marten，旨在通过安全学习框架、基于熵的探索机制和专家惩罚三大机制，解决强化学习算法随机探索和网络错误容忍度之间的矛盾，并避免深度强化学习产生的错误依赖。在LSQUIC平台实现Marten，经过真实网络测试，证实Marten可有效降低模型长尾效应。借助熵实现了场景切换的自适应探索，防止了错误依赖问题。Marten相比Eagle吞吐提升了0.36%，时延降低了14.89 ms；相比BBR吞吐提升了2.79%，时延降低了11.73%。设计基于元强化学习的多用户异构融合方案Cavy，借助多线程协同和锁机制、元强化学习算法和用户流分析机制，为处于不同网络场景用户适配专属策略，增强模型收敛速度、泛化能力和专属场景高性能。在LSQUIC平台实现Cavy，通过真实网络测试，验证Cavy能有效解决新场景的快速适应和旧场景的记忆保留问题，成功在用户异构场景中为各类用户提供高性能支持。Cavy相比于传统DRL模型吞吐提升了6.74%，时延降低37.92%。
关键词	网络拥塞控制算法网络测量深度强化学习内置容错安全学习
语种	中文
培养类别	独立培养
入学年份	2021
学位授予年份	2024-06
参考文献列表	[1] MISHRA A, SUN X, JAIN A, et al. The great internet TCP congestion control census[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2019, 3(3): 1-24. [2] BLUM N, LACHAPELLE S, ALVERSTRAND H. WebRTC: Real-time communication for the open web platform[J]. Communications of the ACM, 2021, 64(8): 50-54. [3] LANGLEY A, RIDDOCH A, WILK A, et al. The quic transport protocol: Design and internet-scale deployment[C]//Proceedings of the conference of the ACM special interest group on data communication. 2017: 183-196. [4] JANSEN B, GOODWIN T, GUPTA V, et al. Performance evaluation of WebRTC-based video conferencing[J]. ACM SIGMETRICS Performance Evaluation Review, 2018, 45(3): 56-68. [5] YANG F, WU Q, LI Z, et al. BBRv2+: Towards balancing aggressiveness and fairness with delay-based bandwidth probing[J]. Computer Networks, 2022, 206: 108789. [6] HA S, RHEE I, XU L. CUBIC: a new TCP-friendly high-speed TCP variant[J]. ACM SIGOPS operating systems review, 2008, 42(5): 64-74. [7] ABBASLOO S, XU Y, CHAO H J. C2TCP: A flexible cellular TCP to meet stringent delay requirements[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(4): 918-932. [8] ALIZADEH M, GREENBERG A, MALTZ D A, et al. Data center tcp (dctcp)[C]//Proceedings of the ACM SIGCOMM 2010 Conference. 2010: 63-74. [9] ARUN V, BALAKRISHNAN H. Copa: Practical {Delay-Based} congestion control for the internet[C]//15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 2018: 329-342. [10] MITTAL R, LAM V T, DUKKIPATI N, et al. TIMELY: RTT-based congestion control for the datacenter[J]. ACM SIGCOMM Computer Communication Review, 2015, 45(4): 537-550. [11] BRAGG J, MAUSAM, WELD D S. Sprout: Crowd-powered task design for crowdsourcing[C]//Proceedings of the 31st annual acm symposium on user interface software and technology. 2018: 165-176. [12] ABBASLOO S, YEN C Y, CHAO H J. Classic meets modern: A pragmatic learning-based congestion control for the internet[C]//Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 2020: 632-647. [13] JAY N, ROTMAN N, GODFREY B, et al. A deep reinforcement learning perspective on internet congestion control[C]//International Conference on Machine Learning. PMLR, 2019: 3050-3059. [14] MA Y, TIAN H, LIAO X, et al. Multi-objective congestion control[C]//Proceedings of the Seventeenth European Conference on Computer Systems. 2022: 218-235. [15] LI X, TANG F, LIU J, et al. {AUTO}: Adaptive Congestion Control Based on {Multi-Objective} Reinforcement Learning for the {Satellite-Ground} Integrated Network[C]//2021 USENIX Annual Technical Conference (USENIX ATC 21). 2021: 611-624. [16] DONG M, MENG T, ZARCHY D, et al. {PCC} Vivace:{Online-Learning} Congestion Control[C]//15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 2018: 343-356. [17] ABBASLOO S, YEN C Y, CHAO H J. Wanna make your TCP scheme great for cellular networks? Let machines do it for you![J]. IEEE Journal on Selected Areas in Communications, 2020, 39(1): 265-279. [18] EMARA S, LI B, CHEN Y. Eagle: Refining congestion control by learning from the experts[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 2020: 676-685. [19] ZHANG H, ZHOU A, HU Y, et al. Loki: improving long tail performance of learning-based real-time video adaptation by fusing rule-based models[C]//Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 2021: 775-788. [20] ZHANG H, ZHOU A, LU J, et al. OnRL: improving mobile video telephony via online reinforcement learning[C]//Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 2020: 1-14. [21] LI W, GAO S, LI X, et al. Tcp-neuroc: Neural adaptive tcp congestion control with online changepoint detection[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(8): 2461-2475. [22] WANG B, ZHANG Y, QIAN S, et al. A hybrid receiver-side congestion control scheme for web real-time communication[C]//Proceedings of the 12th ACM Multimedia Systems Conference. 2021: 332-338. [23] DU Z, ZHENG J, YU H, et al. A unified congestion control framework for diverse application preferences and network conditions[C]//Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies. 2021: 282-296. [24] BRAKMO L S, O'MALLEY S W, PETERSON L L. TCP Vegas: New techniques for congestion detection and avoidance[C]//Proceedings of the conference on Communications architectures, protocols and applications. 1994: 24-35. [25] WINSTEIN K, BALAKRISHNAN H. Tcp ex machina: Computer-generated congestion control[J]. ACM SIGCOMM Computer Communication Review, 2013, 43(4): 123-134. [26] SIVAKUMAR V, DELALLEAU O, ROCKTÄSCHEL T, et al. Mvfst-rl: An asynchronous rl framework for congestion control with delayed actions[J]. ArXiv preprint ArXiv:1910.04054, 2019. [27] 赖涵光, 李清, 江勇. 基于场景变化的传输控制协议拥塞控制切换方案[J]. 计算机应用, 2022, 42(4): 1225. [28] ZHENG Y, CHEN H, DUAN Q, et al. Leveraging domain knowledge for robust deep reinforcement learning in networking[C]//IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE, 2021: 1-10. [29] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International conference on machine learning. PMLR, 2014: 387-395. [30] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533. [31] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2016, 30(1). [32] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. ArXiv preprint ArXiv:1509.02971, 2015. [33] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//International conference on machine learning. PMLR, 2015: 1889-1897. [34] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//International conference on machine learning. PMLR, 2016: 1928-1937. [35] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. ArXiv preprint ArXiv:1707.06347, 2017. [36] FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. PMLR, 2018: 1587-1596. [37] GARCIA J, FERNÁNDEZ F. A comprehensive survey on safe reinforcement learning[J]. Journal of Machine Learning Research, 2015, 16(1): 1437-1480. [38] MAO H, SCHWARZKOPF M, HE H, et al. Towards safe online reinforcement learning in computer systems[C]//NeurIPS Machine Learning for Systems Workshop. 2019. [39] ALSHIEKH M, BLOEM R, EHLERS R, et al. Safe reinforcement learning via shielding[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1). [40] THOMAS G, LUO Y, MA T. Safe reinforcement learning by imagining the near future[J]. Advances in Neural Information Processing Systems, 2021, 34: 13859-13869. [41] FULTON N, PLATZER A. Safe reinforcement learning via formal methods: Toward safe control through proof and learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1). [42] BECK J, VUORIO R, LIU E Z, et al. A survey of meta-reinforcement learning[J]. ArXiv preprint ArXiv:2301.08028, 2023. [43] DUAN Y, SCHULMAN J, CHEN X, et al. Rl $^ 2$: Fast reinforcement learning via slow reinforcement learning[J]. ArXiv preprint ArXiv:1611.02779, 2016. [44] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International conference on machine learning. PMLR, 2017: 1126-1135. [45] GUPTA A, MENDONCA R, LIU Y X, et al. Meta-reinforcement learning of structured exploration strategies[J]. Advances in neural information processing systems, 2018, 31. [46] RAKELLY K, ZHOU A, FINN C, et al. Efficient off-policy meta-reinforcement learning via probabilistic context variables[C]//International conference on machine learning. PMLR, 2019: 5331-5340. [47] TIAN H, LIAO X, ZENG C, et al. Spine: an efficient DRL-based congestion control with ultra-low overhead[C]//Proceedings of the 18th International Conference on Emerging Networking EXperiments and Technologies. 2022: 261-275. [48] ZHANG J, ZENG C, ZHANG H, et al. Liteflow: towards high-performance adaptive neural networks for kernel datapath[C]//Proceedings of the ACM SIGCOMM 2022 Conference. 2022: 414-427. [49] WANG Y, CHEN K, TAN H, et al. Tabi: An Efficient Multi-Level Inference System for Large Language Models[C]//Proceedings of the Eighteenth European Conference on Computer Systems. 2023: 233-248. [50] AKGUN I U, AYDIN A S, ZADOK E. KMLIB: Towards Machine Learning for Operating Systems[C]//Proceedings of the On-Device Intelligence Workshop, co-located with the MLSys Conference. 2020: 1-6. [51] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. Openai gym[J]. ArXiv preprint ArXiv:1606.01540, 2016. [52] NETRAVALI R, SIVARAMAN A, DAS S, et al. Mahimahi: accurate {Record-and-Replay} for {HTTP}[C]//2015 USENIX Annual Technical Conference (USENIX ATC 15). 2015: 417-429. [53] ABBASLOO S, YEN C Y, CHAO H J. Make tcp great (again?!) in cellular networks: A deep reinforcement learning approach[J]. ArXiv preprint ArXiv:1912.11735, 2019. [54] FLOYD S. Metrics for the evaluation of congestion control mechanisms[R]. 2008. [55] GIESSLER A, HAENLE J, KÖNIG A, et al. Free buffer allocation—An investigation by simulation[J]. Computer Networks (1976), 1978, 2(3): 191-208.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP393.0
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/766025
专题	南方科技大学未来网络研究院
推荐引用方式 GB/T 7714	潘知渊. 基于强化学习的QUIC拥塞控制性能诊断和优化[D]. 深圳. 南方科技大学,2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12133090-潘知渊-未来网络研究院（4327KB）	--	--	限制开放	--	请求全文