中文版 | English
题名

基于实时特征选择的微服务系统异常检测

其他题名
EFFECTIVE ANOMALY DETECTION FOR MICROSERVICE SYSTEMS WITH REAL-TIME FEATURE SELECTION
姓名
姓名拼音
ZHOU Siqi
学号
12032484
学位类型
硕士
学位专业
0809 电子科学与技术
学科门类/专业学位类别
08 工学
导师
刘烨庞
导师单位
计算机科学与工程系
论文答辩日期
2023-05-13
论文提交日期
2023-06-27
学位授予单位
南方科技大学
学位授予地点
深圳
摘要
近年来,为了提升软件开发效率和提高软件系统负载能力,软件后端开发越来越流行使用微服务架构。然而,微服务系统的复杂拓扑结构、分布式部署以及持续集成等特性,使得日常运维中微服务系统的异常检测、故障分析以及故障修复,通常耗费大量人力与物力。为此AIOpsArtificial Intelligence for IT Operations)被提出,旨在使用机器学习技术自动化与智能化IT运维,提高运维的效率与质量。首先,我们对现有多维时间序列异常检测算法在真实微服务数据集上进行实证研究,评估各个算法的实际性能。我们发现一个普遍的问题:在处理微服务系统产生的高维时间序列时,各个算法的检测效果对阈值设置很敏感。也就是说,微小的阈值扰动会导致检测效果的大幅下降。我们设计了指标R-value来衡量阈值敏感程度,即通过蒙特卡洛采样计算不同阈值设置下检测结果F1-score的平均值。对数据集进行分析之后,我们发现了潜在原因:各个微服务系统故障影响的指标子集不同,异常检测算法需要搜索这些受到影响的指标子集,才能降低无关指标对异常检测结果的影响。为了解决这个问题,我们提出了一个新的异常检测框架COADCombinatorial Optimization enhanced Anomaly Detection),COAD通过结合元启发式组合优化算法,赋予各个异常检测算法实时特征选择/实时降维的能力,使得异常检测算法能够为不同故障给出相对均衡的异常分数,以缓解原异常检测算法的阈值敏感问题,最终增强原算法的检测能力。我们在实证研究和评估COAD的实验中使用了同一个真实数据集,产生自实际部署的三个微服务系统测试床(部署的框架为谷歌开源的Hipstershop),总共包括7天左右的数据。我们在评估COAD的实验中测试了不同算法的组合情况:异常检测算法包括KNNGMM以及LOF组合优化算法包括GAPSO以及AEO。实验结果显示:(1COAD可以显著降低原检测算法的阈值敏感度(由R-value衡量),平均降低142%。除此之外;(2)原异常检测算法的最佳检测性能也普遍提升(由最佳阈值设置下的F1-score衡量),平均提升5.67%。这些结果证明了COAD的有效性和潜在的有用性。未来COAD可以应用到更多算法及数据集上,进行更多的评估,还可以加以改进以提升其计算效率,降低实时特征选择的时间成本。
关键词
语种
中文
培养类别
独立培养
入学年份
2020
学位授予年份
2023-06
参考文献列表

[1] CERNY T, DONAHOO M J, TRNKA M. Contextual understanding of microservice archi-
tecture:current and future directions[J].ACM Sigapp Applied Computing Review,2018,17:29-45.

[2] BUSHONG V,ABDELFATTAH A S,MARUF A A, et al. On Microservice Analysis and
Architecture Evolution: A Systematic Mapping Study[J]. Applied Sciences,2021.

[3] ZHOU X,PENG X,XIE T,et al.FaultAnalysis and Debugging of Microservice Systems:
Industrial Survey,Benchmark System,and Empirical Study[J]. IEEE Transactions on Software Engineering,2021,47:243-260.

[4] Hipstershop[EB/OL].https://github.com/abruneauhipstershop.

[5] ZHOU H,CHEN M,LIN Q,et al.Overload Control for Scaling WeChat Microservices[J].
Proceedings of the ACM Symposium on Cloud Computing,2018.

[6] FRANCESCOPD,LAGOP,MALAVOLTAI. Migrating Towards Microservice Architectures:
An Industrial Survey[J].2018 IEEE International Conference on Software Architecture(ICSA), 2018:29-2909.

[7] AIOps[EB/OL].https://www.gartner.com/en/information-technology/glossary/aiops-artificia
l-intelligence-operations.

[8] LI Y,JIANGZM,LIH,et al.PredictingNode Failures in an Ultra-Large-Scale Cloud Com-
puting Platform[J]. ACM Transactions on Software Engineering and Methodology (TOSEM), 2020,29:1-24.

[9] CHANDOLA V,BANERJEE A,KUMARV. Anomaly detection: A survey[J]. ACM Comput.
Surv.,2009,41:15:1-15:58.

[10] HAN S,HUX,HUANG H,et al.ADBench: Anomaly Detection Benchmark[C]//Neural Infor-
mation Processing Systems(NeurIPS).2022.

[11] LIN J,CHEN P,ZHENG Z.Microscope:Pinpoint Performance Issues with Causal Graphs in
Micro-service Environments[C]//ICSOC.2018.

[12] WUL,TORDSSONJ,ELMROTHE,etal. MicroRCA: Root Cause Localization of Perfor-
mance Issues in Microservices[J]. NOMS 2020-2020 IEEE/IFIP Network Operations and Management Symposium,2020:1-9.

[13] LI Z, CHEN J,JIAO R, et al. Practical Root Cause Localization for Microservice Systems
via Trace Analysis[J]. 2021 IEEE/ACM 29thInternational Symposium on Quality of Service (IWQOS),2021:1-10.

[14] ZHANG C,PENG X,SHA C,etal.DeepTraLog: Trace-Log Combined Microservice Anomaly
Detection through Graph-based Deep Learning[J].2022 IEEE/ACM 44th International Confer-ence on Software Engineering (ICSE),2022:623-634.
[15] CAI Y,HAN B,SU J,et al.TraceModel:An Automatic Anomaly Detection and Root Cause Localization Framework for Microservice Systems[J]. 2021 17th International Conference on Mobility, Sensing and Networking (MSN),2021: 512-519.
[16] LIU P,XU H,OUYANG Q,et al. Unsupervised Detection of Microservice Trace Anomalies through Service-Level Deep Bayesian Networks[J].2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE),2020:48-58.
[17] WU C,ZHAO N,WANG L,et al.Identifying Root-Cause Metrics for Incident Diagnosis in Online Service Systems[J].2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE),2021:91-102.
[18] AGGARWAL C C. Outlier Analysis[CJ/Springer New York.2013.
[19] RUFFL,GORNITZN,DEECKEL,et al. Deep One-Class Classification[C]//ICML. 2018.
[20] KINGMA D P, WELLING M. Auto-Encoding Variational Bayes[J]. CoRR,2014, abs/1312.6114.
[21] LI Z,ZHAO Y,BOTTA N,et al.COPOD: Copula-Based Outlier Detection[J]. 2020 IEEE International Conference on Data Mining(ICDM),2020:118-1123.
[22] LIUF T,TING KM,ZHOUZH.Isolation-Based Anomaly Detection[J]. ACM Trans. Knowl. Discov.Data,2012,6:3:1-3:39.
[23] LATECKILJ,LAZAREVIC A,POKRAJACD. Outlier Detection with Kernel Density Func-tions[C]//MLDM.2007.
[24] ANGIULLI F,PIZZUTI C.Fast OutlierDetection in High Dimensional Spaces[C]/PKDD. 2002.
[25] BREUNIG MM,KRIEGELHP,NGRT,et al.LOF:identifying density-based local outliers [C]//SIGMOD '00.2000.
[26] SHYU M L,CHEN S,SARINNAPAKORN K, et al. A Novel Anomaly Detection Scheme Based on Principal Component Classifier[C]//2003.
[27] AGRAWALP,ABUTARBOUSHHF,GANESH T,et al.Metaheuristic Algorithms on Feature Selection: A Survey of One Decade of Research(2009-2019)[J].IEEE Access,2021,9:26766-26791.
[28] AIOps Challenge[EB/OL].https://aiops-challenge.com/.
[29] FUDANSELAB.Train ticket[M/OL].GitHub.https://github.com/FudanSELab/train-ticket.
[30] Sock shop[EB/OL].https://github.com/microservices-demo/microservices-demo.
[31] POLI R,KENNEDY J,BLACKWELLTM. Particle swarm optimization[J]. Swarm Intelligence.1995.1:33-57.
[32] GUOZHAOW,WANGL,ZHANGZ.Artificial ecosystem-based optimization: a novel nature-inspired meta-heuristic algorithm[J]. Neural Computing and Applications,2019,32:9383-9425.
[33] WHITLEYLD.A genetic algorithm tutorial[J]. Statistics and Computing,1994,4:65-85.
[34] COAD[J/OL].GitHub repository.https://github.com/COAD2022/COAD.
[35] GHAHRAMANI Z. Unsupervised learning[M]/BOUSQET O,RAETSCH G,VONLUXBURG U. Lecture Notes in Artificial Intelligence 3176: Advanced lectures on machine learning.Berlin:Springer-Verlag,2004.
[36] MITCHELL TM.Machine learning[M/OL]. McGraw-Hill,2010.http://www.amazon.com/Machine-Learning-Tom-M-Mitchell/dp/0070428077.
[37] XU R, WUNSCH D C. Survey of clustering algorithms[J]. IEEE Transactions on Neural Networks,2005,16:645-678.
[38] SONG L, MA H, WU M, et al. A BriefSurvey of Dimension Reduction[C]/Sino-foreign-interchange Workshop on Intelligent Science and Intelligent Data Engineering. 2018.
[39] CHICCO D,JURMAN G.The advantagesof the Matthews correlation coefficient (MCC) over Fl score and accuracy in binary classification evaluation[J]. BMC Genomics,2020,21.
[40] SHUMWAY R H,STOFFER D S.Time Series Analysis and Its Applications[M]. Springer, 2000.
[41] BRAEIM,WAGNER S.Anomaly Detectionin Univariate Time-series:A Survey on the State-of-the-Art:abs/2004.00433[A].2020.
[42] SHENG WUH.A survey of research on anomaly detection for time series[J].2016 13th International Computer Conference on Wavelet ActiveMedia Technology and Information Processing (ICCWAMTIP),2016:426-431.
[43] KORTE B H, VYGEN J. Combinatorial Optimization: Theory and Algorithms[M/OL]. Springer-Verlag,2012.DOI:10.1007/978-3-642-244889.
[44] CHANDRASHEKAR G,SAHINF.A survey on feature selection methods[J]. Comput. Electr. Eng.,2014,40:16-28.
[45] WANG P,XUJ,MAM,et al.CloudRanger:Root Cause Identification for Cloud Native Systems [J]. 2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID),2018:492-502.
[46] XUJ,WANGY,CHENP,et al.Lightweight and Adaptive Service API Performance Monitoring in Highly Dynamic Cloud Environment[J]. 2017 IEEE International Conference on Services Computing (SCC),2017:35-43.
[47] MARIANIL,MONNI C,PEZZE M,et al.Localizing Faults in Cloud Systems[J].2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST),2018:262-273.
[48] Three-sigma rule of thumb[EB/OL].https://en.wikipedia.org/wiki/68-95-99.7_rule.
[49] SHAN H, CHEN Y, LIU H, et al. ?-Diagnosis: Unsupervised and Real-time Diagnosis of Small-window Long-tail Latency in Large-scale Microservice Platforms[J]. The World Wide Web Conference,2019.
[50] ZHANG T,RAMAKRISHNANR,LIVNYM.BIRCH:an efficient data clustering method for very large databases[C]//SIGMOD'96.1996.
[51] JIN M,LV A,ZHU Y,et al. An AnomalyDetection Algorithm for Microservice Architecture Based on Robust Principal Component Analysis[J].IEEE Access,2020,8:226397-226408.
[52] ABDEL-BASSET M,ABDEL-FATAH L,SANGAIAH A K. Metaheuristic Algorithms: A Comprehensive Review[C]//2018.
[53] HOLLAND J H. Adaptation in natural andartificial systems[J]. University of Michigan Press, 1975.
[54] ZHAO Y,NASRULLAH Z,LIZ.PyOD:A Python Toolbox for Scalable Outlier Detection [J/OL]. Journal of Machine Learning Research,2019,20(96):1-7.http://jmlr.org/papers/v20/19-011.html.
[55] COAD[J/OL].GitHub repository.https://github.com/COAD2022/COAD.
[56] THIEUN V,MIRJALILIS.MEALPY:aFramework of The State-of-The-Art Meta-Heuristic Algorithms in Python[CP/OL]. Zenodo,2022. https://doi.org/10.5281/zenodo.6684223.

所在学位评定分委会
电子科学与技术
国内图书分类号
TP311.5
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/544038
专题工学院_计算机科学与工程系
推荐引用方式
GB/T 7714
周思奇. 基于实时特征选择的微服务系统异常检测[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12032484-周思奇-计算机科学与工(13938KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[周思奇]的文章
百度学术
百度学术中相似的文章
[周思奇]的文章
必应学术
必应学术中相似的文章
[周思奇]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。