南方科技大学知识苑(SUSTech KC): A Multi-service Multi-user Collaborative Inference Framework in Edge AI

题名	A Multi-service Multi-user Collaborative Inference Framework in Edge AI
其他题名	一种基于边缘智能的多业务多用户协同推理框架
姓名	沈静然
姓名拼音	SHEN Jingran
学号	12132354
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	Georgios Theodoropoulos
导师单位	计算机科学与工程系
论文答辩日期	2024-05-12
论文提交日期	2024-07-02
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	Edge AI, a combination of Artificial Intelligence and Edge Computing, has enabled an ever-increasing number of modern applications like autonomous vehicles, production line automation, and augmented reality. Nevertheless, utilizing specifically Deep Neural Network models is rather demanding on both computing and memory resources. One of the solutions is to partition the models onto the resource-constrained edge servers, following the Collaborative Inference paradigm. Furthermore, in real-world applications, edge servers normally have to handle multiple models, each representing a service, and multiple user requests at the same time. Since existing researches do not examine the aforementioned environment settings as a whole, this research topic aspires to address this challenge by designing a holistic Multi-service Multi-user Collaborative Inference framework, which organically integrates three relevant scheduling problems, namely (i) Server Allocation, (ii) Model Partitioning, and (iii) Data Batching. The proposed framework facilitates Edge AI via (i) algorithm interactions to dynamically improve the corresponding solutions, (ii) a model blueprint dedicated to the partitioning purpose, (iii) a generalized Inference Profiler for flexible and efficient module inference latency prediction, and (iv) a DNN Partitioner to actualize the partition plan and construct a distributed version of the model, to deliver a system that can fully support Multi-service Multi-user Collaborative Inference in edge environments. Besides, an advanced Inference Profiler architecture from the designed framework is implemented, which employs a customizable Regression Model (RM) training workflow and produces a set of trained RMs leading to the highest possible overall prediction accuracy, while keeping the prediction time / space consumption as low as possible. Furthermore, Multi-task Encoder-Decoder Network (MEDN) is proposed as an alternative RM solution. Comprehensive experiment results show that MEDN is fast and lightweight, and capable of achieving the highest overall prediction accuracy and R-squared value. The Time/Space-efficient Auto-selection algorithm also manages to improve the overall accuracy by 2.5% and R-squared by 0.39%, compared to the MEDN single-selection scheme.
关键词	Edge AI Collaborative Inference Deep Learning Multi-service Multi-user
语种	英语
培养类别	独立培养
入学年份	2021
学位授予年份	2024-06
参考文献列表	[1] LUO Q, HU S, LI C, et al. Resource Scheduling in Edge Computing: A Survey[A]. 2021. arXiv: 2108.08059. [2] DUAN Q, HU S, DENG R, et al. Combined Federated and Split Learning in Edge Computing for Ubiquitous Intelligence in Internet of Things: State-of-the-Art and Future Directions[J/OL]. Sensors, 2022, 22(16). https://www.mdpi.com/1424-8220/22/16/5983. DOI: 10.3390/s22165983. [3] SHI W, ZHOU S, NIU Z, et al. Multiuser Co-Inference With Batch Processing Capable Edge Server[J/OL]. IEEE Transactions on Wireless Communications, 2023, 22(1): 286-300. DOI: 10.1109/TWC.2022.3192613. [4] TANG X, CHEN X, ZENG L, et al. Joint Multiuser DNN Partitioning and Computational Resource Allocation for Collaborative Edge Intelligence[J/OL]. IEEE Internet of Things Journal, 2021, 8(12): 9511-9522. DOI: 10.1109/JIOT.2020.3010258. [5] WANG X, HAN Y, WANG C, et al. In-Edge AI: Intelligentizing Mobile Edge Computing, Caching and Communication by Federated Learning[J/OL]. IEEE Network, 2019, 33(5): 156-165. DOI: 10.1109/MNET.2019.1800286. [6] TEERAPITTAYANON S, MCDANEL B, KUNG H. Distributed Deep Neural Networks Over the Cloud, the Edge and End Devices[C/OL]//2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). 2017: 328-339. DOI: 10.1109/ICDCS.2017.226. [7] CHOLLET F. Xception: Deep Learning with Depthwise Separable Convolutions[A]. 2017. arXiv: 1610.02357. [8] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[A]. 2017. arXiv: 1704.04861. [9] TAN M, LE Q V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks[A]. 2020. arXiv: 1905.11946. [10] CHENG Y, WANG D, ZHOU P, et al. A Survey of Model Compression and Acceleration for Deep Neural Networks[A]. 2020. arXiv: 1710.09282. [11] SHARMA R, BIOOKAGHAZADEH S, LI B, et al. Are Existing Knowledge Transfer Techniques Effective for Deep Learning with Edge Devices?[C/OL]//2018 IEEE International Conference on Edge Computing (EDGE). 2018: 42-49. DOI: 10.1109/EDGE.2018.00013. [12] TEERAPITTAYANON S, MCDANEL B, KUNG H T. BranchyNet: Fast Inference via Early Exiting from Deep Neural Networks[A]. 2017. arXiv: 1709.01686. [13] SHAO J, ZHANG H, MAO Y, et al. Branchy-GNN: a Device-Edge Co-Inference Framework for Efficient Point Cloud Processing[A]. 2023. arXiv: 2011.02422. [14] OGDEN S S, GUO T. MODI: Mobile Deep Inference Made Efficient by Edge Computing[C/OL]//USENIX Workshop on Hot Topics in Edge Computing (HotEdge 18). Boston, MA: USENIX Association, 2018. https://www.usenix.org/conference/hotedge18/presentation/ogden. [15] MAO J, CHEN X, NIXON K W, et al. MoDNN: Local distributed mobile computing system for Deep Neural Network[C/OL]//Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. 2017: 1396-1401. DOI: 10.23919/DATE.2017.7927211. [16] HU C, LI B. Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices[C/OL]//IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 2022: 330-339. DOI: 10.1109/INFOCOM48880.2022.9796896. [17] ZENG L, CHEN X, ZHOU Z, et al. CoEdge: Cooperative DNN Inference With Adaptive Workload Partitioning Over Heterogeneous Edge Devices[J/OL]. IEEE/ACM Trans. Netw., 2021, 29(2): 595–608. https://doi.org/10.1109/TNET.2020.3042320. [18] LI E, ZENG L, ZHOU Z, et al. Edge AI: On-Demand Accelerating Deep Neural Network Inference via Edge Computing[A]. 2019. arXiv: 1910.05316. [19] LIN P, SHI Z, XIAO Z, et al. Latency-Driven Model Placement for Efficient Edge Intelligence Service[J/OL]. IEEE Transactions on Services Computing, 2022, 15(2): 591-601. DOI: 10.1109/TSC.2021.3109094. [20] BROWN T B, MANN B, RYDER N, et al. Language Models are Few-Shot Learners[A]. 2020. arXiv: 2005.14165. [21] HUANG X, YU R, LIU J, et al. Parked Vehicle Edge Computing: Exploiting Opportunistic Resources for Distributed Mobile Applications[J/OL]. IEEE Access, 2018, 6: 66649-66663. DOI: 10.1109/ACCESS.2018.2879578. [22] WANG Y, RU Z Y, WANG K, et al. Joint Deployment and Task Scheduling Optimization for Large-Scale Mobile Users in Multi-UAV-Enabled Mobile Edge Computing[J/OL]. IEEE Transactions on Cybernetics, 2020, 50(9): 3984-3997. DOI: 10.1109/TCYB.2019.2935466. [23] DU H, LI Z, NIYATO D, et al. Enabling AI-Generated Content (AIGC) Services in Wireless Edge Networks[M/OL]. arXiv, 2023 [2024-05-06]. http://arxiv.org/abs/2301.03220. DOI: 10.48550/arXiv.2301.03220. [24] KUM S, OH S, YEOM J, et al. Optimization of Edge Resources for Deep Learning Application with Batch and Model Management[J/OL]. Sensors, 2022, 22(17). https://www.mdpi.com/1424-8220/22/17/6717. DOI: 10.3390/s22176717. [25] MIAO W, ZENG Z, LI S, et al. Microservice Replacement Algorithm in Cloud-Edge System for Edge Intelligence[C/OL]//2021 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). 2021: 1737-1744. DOI: 10.1109/ISPA-BDCloud-SocialCom-SustainCom52081.2021.00234. [26] ZHANG M, CAO J, SAHNI Y, et al. Resource-efficient Parallel Split Learning in Heterogeneous Edge Computing[M/OL]. arXiv, 2024 [2024-05-06]. http://arxiv.org/abs/2403.15815. DOI: 10.48550/arXiv.2403.15815. [27] LI P, WANG X, HUANG K, et al. Multi-Model Running Latency Optimization in an Edge Computing Paradigm[J/OL]. Sensors, 2022, 22(16). https://www.mdpi.com/1424-8220/22/16/6097. DOI: 10.3390/s22166097. [28] NISHIO T, YONETANI R. Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge[C/OL]//ICC 2019 - 2019 IEEE International Conference on Communications (ICC). IEEE, 2019. https://doi.org/10.1109%2Ficc.2019.8761315. DOI: 10.1109/icc.2019.8761315. [29] HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[A]. 2015. arXiv: 1512.03385. [30] RONNEBERGER O, FISCHER P, BROX T. U-Net: Convolutional Networks for Biomedical Image Segmentation[A]. 2015. arXiv: 1505.04597. [31] BOCHKOVSKIY A, WANG C Y, LIAO H Y M. YOLOv4: Optimal Speed and Accuracy of Object Detection[A]. 2020. arXiv: 2004.10934. [32] TARNAWSKI J, PHANISHAYEE A, DEVANUR N R, et al. Efficient Algorithms for Device Placement of DNN Graph Operators[A]. 2020. arXiv: 2006.16423. [33] BANITALEBI-DEHKORDI A, VEDULA N, PEI J, et al. Auto-Split: A General Framework of Collaborative Edge-Cloud AI[C/OL]//KDD ’21: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. New York, NY, USA: Association for Computing Machinery, 2021: 2543–2553. https://doi.org/10.1145/3447548.3467078. [34] ONGARO D, OUSTERHOUT J. In Search of an Understandable Consensus Algorithm[C/OL]//2014 USENIX Annual Technical Conference (USENIX ATC 14). Philadelphia, PA: USENIX Association, 2014: 305-319. https://www.usenix.org/conference/atc14/technical-sessions/presentation/ongaro. [35] LI S, ZHAO Y, VARMA R, et al. PyTorch Distributed: Experiences on Accelerating Data Parallel Training[A]. 2020. arXiv: 2006.15704. [36] ABADI M, BARHAM P, CHEN J, et al. TensorFlow: A system for large-scale machine learning[A]. 2016. arXiv: 1605.08695. [37] PASZKE A, GROSS S, MASSA F, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library[A]. 2019. arXiv: 1912.01703. [38] LIU G, DAI F, HUANG B, et al. Towards Accurate Latency Prediction of DNN Layers Inference on Diverse Computing Platforms[C/OL]//2022 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech). 2022: 1-7. DOI: 10.1109/DASC/PiCom/CBDCom/Cy55231.2022.9927862. [39] ZHANG L L, HAN S, WEI J, et al. Nn-Meter: Towards Accurate Latency Prediction of DeepLearning Model Inference on Diverse Edge Devices[C/OL]//MobiSys ’21: Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services. New York, NY, USA: Association for Computing Machinery, 2021: 81–93. https://doi.org/10.1145/3458864.3467882. [40] MENDOZA D. Predicting Latency of Neural Network Inference[C]//2020. [41] KIRILLOV A, MINTUN E, RAVI N, et al. Segment Anything[A]. 2023. [42] LAHIANY A, APERSTEIN Y. PTEENet: Post-Trained Early-Exit Neural Networks Augmentation for Inference Cost Optimization[J/OL]. IEEE Access, 2022, 10: 69680-69687. DOI: 10.1109/ACCESS.2022.3187002. [43] BANK D, KOENIGSTEIN N, GIRYES R. Autoencoders[A]. 2021. arXiv: 2003.05991. [44] PASZKE A, GROSS S, MASSA F, et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library[M]. Red Hook, NY, USA: Curran Associates Inc., 2019.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP181
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/778853
专题	工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	Shen JR. A Multi-service Multi-user Collaborative Inference Framework in Edge AI[D]. 深圳. 南方科技大学,2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12132354-沈静然-计算机科学与工（3447KB）	--	--	限制开放	--	请求全文