中文版 | English
题名

基于强化学习的QUIC拥塞控制性能诊断和优化

其他题名
DIAGNOSIS AND OPTIMIZATION OF QUIC CONESTION CONTROL PERFORMANCE BASED ON REINFORCEMENT LEARNING
姓名
姓名拼音
PAN Zhiyuan
学号
12133090
学位类型
硕士
学位专业
0809 电子科学与技术
学科门类/专业学位类别
08 工学
导师
汪漪
导师单位
未来网络研究院
论文答辩日期
2024-05-07
论文提交日期
2024-06-24
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

网络拥塞控制是互联网传输的基础。基于深度强化学习(Deep Reinforcement Learning, DRL)的拥塞控制机制无需根据网络特征改变内部规则算法,只需根据网络反馈调整发送速率,网络适应性更强。但基于DRL的拥塞控制研究尚处于初始阶段,其性能表现不稳定,无法大规模部署。本文旨在针对基于DRL拥塞控制算法的性能问题进行优化,提高其大规模部署的可能性。

通过实际测试平台分析现有基于DRL拥塞控制算法的性能表现,并发现两个性能瓶颈。瓶颈一,DRL的随机探索机制导致错误动作,严重影响拥塞控制性能。虽然可利用规则算法指导降低影响,但规则算法会限制DRL性能。瓶颈二,多用户异构网络下DRL模型收敛速度无法满足大规模部署要求,具体体现在数据集差异、网络场景不同导致模型记忆丢失。针对上述发现,设计并实现了两个性能优化机制。

设计基于熵和安全学习的智能动态探索方案Marten,旨在通过安全学习框架、基于熵的探索机制和专家惩罚三大机制,解决强化学习算法随机探索和网络错误容忍度之间的矛盾,并避免深度强化学习产生的错误依赖。在LSQUIC平台实现Marten,经过真实网络测试,证实Marten可有效降低模型长尾效应。借助熵实现了场景切换的自适应探索,防止了错误依赖问题。Marten相比Eagle吞吐提升了0.36%,时延降低了14.89 ms;相比BBR吞吐提升了2.79%,时延降低了11.73%

设计基于元强化学习的多用户异构融合方案Cavy,借助多线程协同和锁机制、元强化学习算法和用户流分析机制,为处于不同网络场景用户适配专属策略,增强模型收敛速度、泛化能力和专属场景高性能。在LSQUIC平台实现Cavy,通过真实网络测试,验证Cavy能有效解决新场景的快速适应和旧场景的记忆保留问题,成功在用户异构场景中为各类用户提供高性能支持。Cavy相比于传统DRL模型吞吐提升了6.74%,时延降低37.92%

关键词
语种
中文
培养类别
独立培养
入学年份
2021
学位授予年份
2024-06
参考文献列表

[1] MISHRA A, SUN X, JAIN A, et al. The great internet TCP congestion control census[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2019, 3(3): 1-24.

[2] BLUM N, LACHAPELLE S, ALVERSTRAND H. WebRTC: Real-time communication for the open web platform[J]. Communications of the ACM, 2021, 64(8): 50-54.

[3] LANGLEY A, RIDDOCH A, WILK A, et al. The quic transport protocol: Design and internet-scale deployment[C]//Proceedings of the conference of the ACM special interest group on data communication. 2017: 183-196.

[4] JANSEN B, GOODWIN T, GUPTA V, et al. Performance evaluation of WebRTC-based video conferencing[J]. ACM SIGMETRICS Performance Evaluation Review, 2018, 45(3): 56-68.

[5] YANG F, WU Q, LI Z, et al. BBRv2+: Towards balancing aggressiveness and fairness with delay-based bandwidth probing[J]. Computer Networks, 2022, 206: 108789.

[6] HA S, RHEE I, XU L. CUBIC: a new TCP-friendly high-speed TCP variant[J]. ACM SIGOPS operating systems review, 2008, 42(5): 64-74.

[7] ABBASLOO S, XU Y, CHAO H J. C2TCP: A flexible cellular TCP to meet stringent delay requirements[J]. IEEE Journal on Selected Areas in Communications, 2019, 37(4): 918-932.

[8] ALIZADEH M, GREENBERG A, MALTZ D A, et al. Data center tcp (dctcp)[C]//Proceedings of the ACM SIGCOMM 2010 Conference. 2010: 63-74.

[9] ARUN V, BALAKRISHNAN H. Copa: Practical {Delay-Based} congestion control for the internet[C]//15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 2018: 329-342.

[10] MITTAL R, LAM V T, DUKKIPATI N, et al. TIMELY: RTT-based congestion control for the datacenter[J]. ACM SIGCOMM Computer Communication Review, 2015, 45(4): 537-550.

[11] BRAGG J, MAUSAM, WELD D S. Sprout: Crowd-powered task design for crowdsourcing[C]//Proceedings of the 31st annual acm symposium on user interface software and technology. 2018: 165-176.

[12] ABBASLOO S, YEN C Y, CHAO H J. Classic meets modern: A pragmatic learning-based congestion control for the internet[C]//Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication. 2020: 632-647.

[13] JAY N, ROTMAN N, GODFREY B, et al. A deep reinforcement learning perspective on internet congestion control[C]//International Conference on Machine Learning. PMLR, 2019: 3050-3059.

[14] MA Y, TIAN H, LIAO X, et al. Multi-objective congestion control[C]//Proceedings of the Seventeenth European Conference on Computer Systems. 2022: 218-235.

[15] LI X, TANG F, LIU J, et al. {AUTO}: Adaptive Congestion Control Based on {Multi-Objective} Reinforcement Learning for the {Satellite-Ground} Integrated Network[C]//2021 USENIX Annual Technical Conference (USENIX ATC 21). 2021: 611-624.

[16] DONG M, MENG T, ZARCHY D, et al. {PCC} Vivace:{Online-Learning} Congestion Control[C]//15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). 2018: 343-356.

[17] ABBASLOO S, YEN C Y, CHAO H J. Wanna make your TCP scheme great for cellular networks? Let machines do it for you![J]. IEEE Journal on Selected Areas in Communications, 2020, 39(1): 265-279.

[18] EMARA S, LI B, CHEN Y. Eagle: Refining congestion control by learning from the experts[C]//IEEE INFOCOM 2020-IEEE Conference on Computer Communications. IEEE, 2020: 676-685.

[19] ZHANG H, ZHOU A, HU Y, et al. Loki: improving long tail performance of learning-based real-time video adaptation by fusing rule-based models[C]//Proceedings of the 27th Annual International Conference on Mobile Computing and Networking. 2021: 775-788.

[20] ZHANG H, ZHOU A, LU J, et al. OnRL: improving mobile video telephony via online reinforcement learning[C]//Proceedings of the 26th Annual International Conference on Mobile Computing and Networking. 2020: 1-14.

[21] LI W, GAO S, LI X, et al. Tcp-neuroc: Neural adaptive tcp congestion control with online changepoint detection[J]. IEEE Journal on Selected Areas in Communications, 2021, 39(8): 2461-2475.

[22] WANG B, ZHANG Y, QIAN S, et al. A hybrid receiver-side congestion control scheme for web real-time communication[C]//Proceedings of the 12th ACM Multimedia Systems Conference. 2021: 332-338.

[23] DU Z, ZHENG J, YU H, et al. A unified congestion control framework for diverse application preferences and network conditions[C]//Proceedings of the 17th International Conference on emerging Networking EXperiments and Technologies. 2021: 282-296.

[24] BRAKMO L S, O'MALLEY S W, PETERSON L L. TCP Vegas: New techniques for congestion detection and avoidance[C]//Proceedings of the conference on Communications architectures, protocols and applications. 1994: 24-35.

[25] WINSTEIN K, BALAKRISHNAN H. Tcp ex machina: Computer-generated congestion control[J]. ACM SIGCOMM Computer Communication Review, 2013, 43(4): 123-134.

[26] SIVAKUMAR V, DELALLEAU O, ROCKTÄSCHEL T, et al. Mvfst-rl: An asynchronous rl framework for congestion control with delayed actions[J]. ArXiv preprint ArXiv:1910.04054, 2019.

[27] 赖涵光, 李清, 江勇. 基于场景变化的传输控制协议拥塞控制切换方案[J]. 计算机应用, 2022, 42(4): 1225.

[28] ZHENG Y, CHEN H, DUAN Q, et al. Leveraging domain knowledge for robust deep reinforcement learning in networking[C]//IEEE INFOCOM 2021-IEEE Conference on Computer Communications. IEEE, 2021: 1-10.

[29] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International conference on machine learning. PMLR, 2014: 387-395.

[30] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. nature, 2015, 518(7540): 529-533.

[31] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI conference on artificial intelligence. 2016, 30(1).

[32] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. ArXiv preprint ArXiv:1509.02971, 2015.

[33] SCHULMAN J, LEVINE S, ABBEEL P, et al. Trust region policy optimization[C]//International conference on machine learning. PMLR, 2015: 1889-1897.

[34] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//International conference on machine learning. PMLR, 2016: 1928-1937.

[35] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J]. ArXiv preprint ArXiv:1707.06347, 2017.

[36] FUJIMOTO S, HOOF H, MEGER D. Addressing function approximation error in actor-critic methods[C]//International conference on machine learning. PMLR, 2018: 1587-1596.

[37] GARCIA J, FERNÁNDEZ F. A comprehensive survey on safe reinforcement learning[J]. Journal of Machine Learning Research, 2015, 16(1): 1437-1480.

[38] MAO H, SCHWARZKOPF M, HE H, et al. Towards safe online reinforcement learning in computer systems[C]//NeurIPS Machine Learning for Systems Workshop. 2019.

[39] ALSHIEKH M, BLOEM R, EHLERS R, et al. Safe reinforcement learning via shielding[C]//Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).

[40] THOMAS G, LUO Y, MA T. Safe reinforcement learning by imagining the near future[J]. Advances in Neural Information Processing Systems, 2021, 34: 13859-13869.

[41] FULTON N, PLATZER A. Safe reinforcement learning via formal methods: Toward safe control through proof and learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).

[42] BECK J, VUORIO R, LIU E Z, et al. A survey of meta-reinforcement learning[J]. ArXiv preprint ArXiv:2301.08028, 2023.

[43] DUAN Y, SCHULMAN J, CHEN X, et al. Rl $^ 2$: Fast reinforcement learning via slow reinforcement learning[J]. ArXiv preprint ArXiv:1611.02779, 2016.

[44] FINN C, ABBEEL P, LEVINE S. Model-agnostic meta-learning for fast adaptation of deep networks[C]//International conference on machine learning. PMLR, 2017: 1126-1135.

[45] GUPTA A, MENDONCA R, LIU Y X, et al. Meta-reinforcement learning of structured exploration strategies[J]. Advances in neural information processing systems, 2018, 31.

[46] RAKELLY K, ZHOU A, FINN C, et al. Efficient off-policy meta-reinforcement learning via probabilistic context variables[C]//International conference on machine learning. PMLR, 2019: 5331-5340.

[47] TIAN H, LIAO X, ZENG C, et al. Spine: an efficient DRL-based congestion control with ultra-low overhead[C]//Proceedings of the 18th International Conference on Emerging Networking EXperiments and Technologies. 2022: 261-275.

[48] ZHANG J, ZENG C, ZHANG H, et al. Liteflow: towards high-performance adaptive neural networks for kernel datapath[C]//Proceedings of the ACM SIGCOMM 2022 Conference. 2022: 414-427.

[49] WANG Y, CHEN K, TAN H, et al. Tabi: An Efficient Multi-Level Inference System for Large Language Models[C]//Proceedings of the Eighteenth European Conference on Computer Systems. 2023: 233-248.

[50] AKGUN I U, AYDIN A S, ZADOK E. KMLIB: Towards Machine Learning for Operating Systems[C]//Proceedings of the On-Device Intelligence Workshop, co-located with the MLSys Conference. 2020: 1-6.

[51] BROCKMAN G, CHEUNG V, PETTERSSON L, et al. Openai gym[J]. ArXiv preprint ArXiv:1606.01540, 2016.

[52] NETRAVALI R, SIVARAMAN A, DAS S, et al. Mahimahi: accurate {Record-and-Replay} for {HTTP}[C]//2015 USENIX Annual Technical Conference (USENIX ATC 15). 2015: 417-429.

[53] ABBASLOO S, YEN C Y, CHAO H J. Make tcp great (again?!) in cellular networks: A deep reinforcement learning approach[J]. ArXiv preprint ArXiv:1912.11735, 2019.

[54] FLOYD S. Metrics for the evaluation of congestion control mechanisms[R]. 2008.

[55] GIESSLER A, HAENLE J, KÖNIG A, et al. Free buffer allocation—An investigation by simulation[J]. Computer Networks (1976), 1978, 2(3): 191-208.

所在学位评定分委会
电子科学与技术
国内图书分类号
TP393.0
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/766025
专题南方科技大学
未来网络研究院
推荐引用方式
GB/T 7714
潘知渊. 基于强化学习的QUIC拥塞控制性能诊断和优化[D]. 深圳. 南方科技大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12133090-潘知渊-未来网络研究院(4327KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[潘知渊]的文章
百度学术
百度学术中相似的文章
[潘知渊]的文章
必应学术
必应学术中相似的文章
[潘知渊]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。