南方科技大学知识苑(SUSTech KC): 面向数据紧缺和约束违反问题的即时软件缺陷预测方法研究

题名	面向数据紧缺和约束违反问题的即时软件缺陷预测方法研究
其他题名	JUST-IN-TIME SOFTWARE DEFECT PREDICTION APPROACH FOR DATA SHORTAGE AND CONSTRAINT VIOLATION
姓名	滕聪
姓名拼音	TENG Cong
学号	12132358
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	宋丽妍
导师单位	计算机科学与工程系
论文答辩日期	2024-05-12
论文提交日期	2024-06-24
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	在软件开发过程中，即时软件缺陷预测（Just-In-Time Software Defect Prediction，JIT-SDP）能够对开发者每次提交的软件变更做出预测，即时地判断其是否可能引入软件缺陷，减少修复成本，提高软件质量。JIT-SDP 领域已有诸多相关研究，其中在线场景下的数据紧缺问题和约束优化场景下的约束违反问题是本论文的研究焦点。JIT-SDP 中在线场景下的数据紧缺问题是值得研究的。项目开发初期，模型缺少足量的数据来构建预测性能较好的模型；概念漂移导致性能骤降期，模型缺少足量的数据以快速适应新环境。跨项目方法通过加入外部数据对数据量进行补充，常用于缓解JIT-SDP 中的数据紧缺问题。然而，最先进的在线跨项目方法All-in-One（AIO）和Filtering 存在局限性，前者会加入无关的数据，缺乏有效性；后者加入的数据被限制在一定的范围，缺乏多样性。它们并未有效充分地利用外部数据，存在提升空间。本文提出了考虑项目间相似性的在线项目级跨项目方法及其改进版本，以保证加入数据的多样性和有效性。在23 个开源项目数据集上的实验结果表明，该方法能够显著提升内部项目的预测性能，与最先进的在线跨项目方法相比表现出显著优于或相当的预测性能，且其改进版本的预测性能显著优于其他方法。在JIT-SDP 特定约束优化场景中面临的约束违反问题也是值得研究的。JITSDP中存在一种约束优化的应用场景，即在限制模型对于非缺陷类误判的情况下，追求更强的缺陷类预测能力。该应用能够控制开发者代码检查时所做的无用功，适应于不同公司的需求，具有现实意义和研究价值。由于JIT-SDP 中不可避免的数据偏移问题，现有方法在应对该约束优化问题时存在较为严重的约束违反问题。本文采用差分进化算法，通过设置一个自适应的约束值，帮助模型获得更强的约束满足能力。在10 个开源项目数据集上的实验结果表明，相比于现有算法，带有自适应约束值的差分进化算法在满足约束和追求缺陷类预测上具有更好的表现。
关键词	即时软件缺陷预测数据紧缺约束优化跨项目方法演化算法
语种	中文
培养类别	独立培养
入学年份	2021
学位授予年份	2021-07
参考文献列表	[1] FAVARÒ F M, JACKSON D W, SALEH J H, et al. Software contributions to aircraft adverse events: Case studies and analyses of recurrent accident patterns and failure mechanisms[J]. Reliability Engineering & System Safety, 2013, 113: 131-142. [2] HALL T, BEECHAM S, BOWES D, et al. A systematic literature review on fault prediction performance in software engineering[J]. IEEE Transactions on Software Engineering, 2011, 38 (6): 1276-1304. [3] KIM S, WHITEHEAD E J, ZHANG Y. Classifying software changes: Clean or buggy?[J]. IEEE Transactions on Software Engineering, 2008, 34(2): 181-196. [4] 蔡亮, 范元瑞, 鄢萌, 等. 即时软件缺陷预测研究进展[J]. 软件学报, 2019, 30(5): 1288-1307. [5] ŚLIWERSKI J, ZIMMERMANN T, ZELLER A. When do changes induce fixes?[J]. ACM SIGSOFT Software Engineering Notes, 2005, 30(4): 1-5. [6] KIM S, ZIMMERMANN T, PAN K, et al. Automatic identification of bug-introducing changes [C]//International Conference on Automated Software Engineering. IEEE, 2006: 81-90. [7] DA COSTA D A, MCINTOSH S, SHANG W Y, et al. A framework for evaluating the results of the SZZ approach for identifying bug-introducing changes[J]. IEEE Transactions on Software Engineering, 2017, 43(7): 641–657. [8] NETO E C, DA COSTA D A, KULESZA U. The impact of refactoring changes on the SZZ algorithm: An empirical study[C]//International Conference on Software Analysis, Evolution and Reengineering. IEEE, 2018: 380-390. [9] BORG M, SVENSSON O, BERG K, et al. SZZ Unleashed: An open implementation of the SZZ algorithm - Featuring example usage in a study of just-in-time bug prediction for the Jenkins project[C]//ACM SIGSOFT International Workshop on Machine Learning Techniques for Software Quality Evaluation. Association for Computing Machinery, 2019: 7-12. [10] KAMEI Y, SHIHAB E, ADAMS B, et al. A large-scale empirical study of just-in-time quality assurance[J]. IEEE Transactions on Software Engineering, 2012, 39(6): 757-773. [11] TABASSUM S, MINKU L L, FENG D Y. Cross-project online just-in-time software defect prediction[J]. IEEE Transactions on Software Engineering, 2022, 49(1): 268-287. [12] 陈丽琼, 王璨, 宋士龙. 一种即时软件缺陷预测模型及其可解释性研究[J]. 小型微型计算机系统, 2022, 43(4): 865-871. [13] MCINTOSH S, KAMEI Y. Are fix-inducing changes a moving target? A longitudinal case study of just-in-time defect prediction[C]//Proceedings of the 40th International Conference on Software Engineering. Association for Computing Machinery, 2018: 560-560. [14] CABRAL G G, MINKU L L, SHIHAB E, et al. Class imbalance evolution and verification latency in just-in-time software defect prediction[C]//International Conference on Software Engineering. IEEE, 2019: 666-676. [15] CABRAL G G, MINKU L L. Towards reliable online just-in-time software defect prediction [J]. IEEE Transactions on Software Engineering, 2022, 49(3): 1342-1358. [16] KAMEI Y, SHIHAB E, ADAMS B, et al. A large-scale empirical study of just-in-time quality assurance[J]. IEEE Transactions on Software Engineering, 2013, 39(6): 757-773. [17] 葛建, 虞慧群, 范贵生, 等. 面向智能计算框架的即时缺陷预测[J]. 软件学报, 2023, 34(9): 3966-3980. [18] WANG S, MINKU L L, YAO X. Resampling-based ensemble methods for online class imbalance learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 27(5): 1356-1368. [19] KAMEI Y, FUKUSHIMA T, MCINTOSH S, et al. Studying just-in-time defect prediction using cross-project models[J]. Empirical Software Engineering, 2016, 21: 2072-2106. [20] SAHAR H, LIU Y, BANGASH A A, et al. IRJIT–An information retrieval technique for just-in-time defect identification[A/OL]. 2022. https://arxiv.org/abs/2210.02435. [21] PORNPRASIT C, TANTITHAMTHAVORN C, JIARPAKDEE J, et al. Pyexplainer: Explaining the predictions of just-in-time defect models[C]//International Conference on Automated Software Engineering. IEEE, 2021: 407-418. [22] ZHENG W, SHEN T R, CHEN X, et al. Interpretability application of the just-in-time software defect prediction model[J]. Journal of Systems and Software, 2022, 188: 111245. [23] TABASSUM S, MINKU L L, FENG D Y, et al. An investigation of cross-project learning in online just-in-time software defect prediction[C]//International Conference on Software Engineering. Association for Computing Machinery, 2020: 554-565. [24] SONG L Y, LI S X, MINKU L L, et al. A novel data stream learning approach to tackle one-sided label noise from verification latency[C]//International Joint Conference on Neural Networks. IEEE, 2022: 1-8. [25] YANG X G, YU H Q, FAN G S, et al. DEJIT: A differential evolution algorithm for effort-aware just-in-time software defect prediction[J]. International Journal of Software Engineering and Knowledge Engineering, 2021, 31(3): 289-310. [26] ZHANG T H R, YU Y, MAO X J, et al. FENSE: A feature-based ensemble modeling approach to cross-project just-in-time defect prediction[J]. Empirical Software Engineering, 2022, 27(7): 162. [27] SHEHAB M A, HAMOU-LHADJ A, ALAWNEH L. ClusterCommit: A just-in-time defect prediction approach using clusters of projects[C]//International Conference on Software Analysis, Evolution and Reengineering. IEEE, 2022: 333-337. [28] CHO Y, KWON J H, KO I Y. Cross-sub-project just-in-time defect prediction on multi-repo projects[C]//International Workshop on Quantitative Approaches to Software Quality. 2018: 2-9. [29] YAO X, XU Y. Recent advances in evolutionary computation[J]. Journal of Computer Science and Technology, 2006, 21(1): 1-18. [30] LIN D Y, TANTITHAMTHAVORN C, HASSAN A E. The impact of data merging on the interpretation of cross-project just-in-time defect models[J]. IEEE Transactions on Software Engineering, 2021, 48(8): 2969-2986. [31] YANG X G, YU H Q, FAN G S, et al. An empirical study on optimal solutions selection strategies for effort-aware just-in-time software defect prediction[C]//Proceedings of the 31st International Conference on Software Engineering and Knowledge Engineering, SEKE: Vol. 2019. 2019: 319-324. [32] KHANAN C, LUEWICHANA W, PRUKTHARATHIKOON K, et al. JITBot: An explainable just-in-time defect prediction bot[C]//Proceedings of the 35th IEEE/ACM international conference on automated software engineering. Association for Computing Machinery, 2020: 1336-1339. [33] YANG X G, YU H Q, FAN G S, et al. An empirical study of model-agnostic interpretation technique for just-in-time software defect prediction[C]//Collaborative Computing: Networking, Applications and Worksharing. Springer, 2021: 420-438. [34] PORNPRASIT C, TANTITHAMTHAVORN C K. JITLine: A simpler, better, faster, finergrained just-in-time defect prediction[C]//International Conference on Mining Software Repositories. IEEE, 2021: 369-379. [35] TAN M, TAN L, DARA S, et al. Online defect prediction for imbalance data[C]//International Conference on Software Engineering. IEEE, 2015: 99-108. [36] YANG X L, LO D, XIA X, et al. Deep learning for just-in-time defect prediction[C]//2015 IEEE International Conference on Software Quality, Reliability and Security. IEEE, 2015: 17-26. [37] HOANG T, DAM H K, KAMEI Y, et al. DeepJIT: An end-to-end deep learning framework for just-in-time defect prediction[C]//2019 IEEE/ACM 16th International Conference on Mining Software Repositories. IEEE, 2019: 34-45. [38] ZHAO Y, CHEN H. Deep incremental learning of imbalanced data for just-in-time software defect prediction[A/OL]. 2023. https://arxiv.org/abs/2310.12289. [39] FUKUSHIMA T, KAMEI Y, MCINTOSH S, et al. An empirical study of just-in-time defect prediction using cross-project models[C]//Working Conference on Mining Software Repositories. Association for Computing Machinery, 2014: 172-181. [40] SONG L Y, MINKU L, YAO X. On the validity of retrospective predictive performance evaluation procedures in just-in-time software defect prediction[J]. Empirical Software Engineering, 2023, 28(5): 124. [41] YANG X G, YU H Q, FAN G S, et al. Local versus global models for just-in-time software defect prediction[J]. Scientific Programming, 2019, 2019. [42] ZHU X Y, QIU T, WANG J Y, et al. A novel instance-based method for cross-project just-in-time defect prediction[J]. Software: Practice and Experience, 2024. [43] DAVENPORT M A, BARANIUK R G, SCOTT C D. Controlling false alarms with support vector machines[C]//2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. IEEE, 2006: 589-592. [44] BROADWATER J B, CARMEN C, LLORENS A J. False alarm constrained classification[C/OL]//International Conference and Exhibition on Underwater Acoustic Measurements: Technologies and Results. 2011: 347-354. https://api.semanticscholar.org/CorpusID:15744572. [45] CHEN X, ZHAO Y Q, WANG Q P, et al. MULTI: Multi-objective effort-aware just-in-time software defect prediction[J]. Information and Software Technology, 2018, 93: 1-13. [46] DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE transactions on evolutionary computation, 2002, 6(2): 182-197. [47] CHEN X, XIA H L, PEI W L, et al. Boosting multi-objective just-in-time software defect prediction by fusing expert metrics and semantic metrics[J]. Journal of Systems and Software, 2023, 206: 111853. [48] ZHENG S, GAI J J, YU H L, et al. Training data selection for imbalanced cross-project defect prediction[J]. Computers and Electrical Engineering, 2021, 94: 107370. [49] ZIMMERMANN T, NAGAPPAN N, GALL H, et al. Cross-project defect prediction: a large scale experiment on data vs. domain vs. process[C]//Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering. Association for Computing Machinery, 2009: 91-100. [50] ROSEN C, GRAWI B, SHIHAB E. Commit guru: Analytics and risk prediction of software commits[C]//Joint Meeting on Foundations of Software Engineering. Association for Computing Machinery, 2015: 966-969. [51] FAWCETT T. An introduction to ROC analysis[J]. Pattern recognition letters, 2006, 27(8): 861-874. [52] WOOLSON R F. Wilcoxon signed-rank test[J]. Wiley Encyclopedia of Clinical Trials, 2007: 1-3. [53] MIRJALILI S, MIRJALILI S. Genetic algorithm[J]. Evolutionary Algorithms and Neural Networks: Theory and Applications, 2019: 43-55. [54] KUBAT M, HOLTE R C, MATWIN S. Machine learning for the detection of oil spills in satellite radar images[J]. Machine learning, 1998, 30: 195-215. [55] MATTHEWS B W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme[J]. Biochimica et Biophysica Acta-Protein Structure, 1975, 405(2): 442-451. [56] WANG S, MINKU L L, YAO X. A systematic study of online class imbalance learning with concept drift[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(10): 4802-4821. [57] CHICCO D, JURMAN G. The advantages of the Matthews Correlation Coefficient (MCC) over F1 score and accuracy in binary classification evaluation[J]. BMC genomics, 2020, 21(1):1-13. [58] CHICCO D, WARRENS M, JURMAN G. The Matthews Correlation Coefficient (MCC) is more informative than Cohen’s kappa and brier Score in binary classification assessment[J]. IEEE Access, 2021, 9: 78368-78381. [59] SONG L Y, MINKU L L, TENG C, et al. A practical human labeling method for online just-in-time software defect prediction[C]//Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE). Association for Computing Machinery, 2023: 605-617. [60] TANTITHAMTHAVORN C, MCINTOSH S, HASSAN A E, et al. The impact of automated parameter optimization on defect prediction models[J]. IEEE Transactions on Software Engineering, 2018, 45(7): 683-711. [61] DEB K. An efficient constraint handling method for genetic algorithms[J]. Computer Methods in Applied Mechanics and Engineering, 2000, 186(2-4): 311-338. [62] LOH W L. On Latin hypercube sampling[J]. The annals of statistics, 1996, 24(5): 2058-2080. [63] DAWAR D, LUDWIG S A. Differential evolution with dither and annealed scale factor[C]// 2014 IEEE Symposium on Differential Evolution. IEEE, 2014: 1-8. [64] DEB K. An efficient constraint handling method for genetic algorithms[J]. Computer methods in applied mechanics and engineering, 2000, 186(2-4): 311-338. [65] GERALD B. A brief review of independent, dependent and one sample t-test[J]. International Journal of Applied Mathematics and Theoretical Physics, 2018, 4(2): 50-54. [66] WILCOXON F. Individual comparisons by ranking methods[M]. Springer, 1992. [67] QUIÑONERO-CANDELA J, SUGIYAMA M, SCHWAIGHOFER A, et al. Dataset shift in machine learning[M]. Mit Press, 2008.
所在学位评定分委会	电子科学与技术
国内图书分类号	TP311.5
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/765956
专题	南方科技大学工学院_计算机科学与工程系
推荐引用方式 GB/T 7714	滕聪. 面向数据紧缺和约束违反问题的即时软件缺陷预测方法研究[D]. 深圳. 南方科技大学,2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12132358-滕聪-计算机科学与工程（2571KB）	--	--	限制开放	--	请求全文