中文版 | English
题名

MULTI-MODEL TRAFFIC ACCIDENT SEVERITY CLASSIFICATION FRAMEWORK

其他题名
基于多模型的交通事故严重程度分类框架
姓名
姓名拼音
ZHANG Anyi
学号
12132911
学位类型
硕士
学位专业
0701 数学
学科门类/专业学位类别
07 理学
导师
杨丽丽
导师单位
统计与数据科学系
论文答辩日期
2023-05-07
论文提交日期
2023-06-29
学位授予单位
南方科技大学
学位授予地点
深圳
摘要
Serious traffffic accidents can cause great personal injury and property damage. Typically, a serious traffffic accident is defined as one or more fatalities or three or more serious injuries or 30,000 or more in property damage. In this study, we define a traffffic accident with a clearance time greater than 120 minutes as a serious traffffic accident, based on the 2009 edition of the US Manual on Uniform Traffffic Control Devices. Accurate prediction of accident severity can help road managers make more effective decisions and reduce the amount of property damage caused. In the past, traditional statistical methods and machine learning methods have been used to predict accident severity. The models in traditional statistical methods have their own set of model assumptions and potential links between the dependent and independent variables. The models in machine learning methods lack explanations and are computationally slow. To overcome these problems, we propose a multi-model traffffic accident severity classification framework. This framework consists of three components: first, pre-processing of imbalance data. Next, variable selection is performed. Finally, the selected variables are used to build logistic models and three hybrid models: RF-SVM, RF-BPNN, and RF-BN. This study uses traffffic accident data from four highways in Shandong Province. First, we resample the raw data by performing random oversampling and random undersampling respectively. Variable selection was then carried out using stepwise regression and 21 variables were obtained. The 14 significant variables among the 21 variables were ranked in order of importance using random forest, and 10 more important variables were selected, and then RF-SVM, RF-BPNN and RF-BN were built with these ten variables. The test criterion we choose is accuracy. By comparing the results, we found that the RF-SVM model had the highest prediction accuracy of 0.98 under the oversampled dataset. This framework can be used to predict the severity of traffffic accidents and to manage them in a timely manner in order to reduce casualties and property damage.
关键词
语种
英语
培养类别
独立培养
入学年份
2021
学位授予年份
2023-06-30
参考文献列表

[1]KI Y K, LEE D Y. A Traffic Accident Recording and Reporting Model at Intersections[J/OL].IEEE Transactions on Intelligent Transportation Systems, 2007, 8(2): 188-194. DOI: 10.1109/TITS.2006.890070.
[2] WHO W H O. Road traffic injuries[J]. World Health Organization: WHO, 2018.
[3] RUMAR K. Transport safety visions, targets and strategies: beyond 2000[J]. First EuropeanTransport Safety lecture, Brussels, European Transport Safety Council, 1999: 6-8.
[4] HUMMEL T. Land use planning in safer transportation network planning[J]. SWOV Institutefor Road Safety Research.–Leidschendam, The Netherland, 2001.
[5] EDLIN A. Per-Mile premiums for auto insurance[R]. Cambridge, MA: National Bureau ofEconomic Research, 1999.
[6] AMERATUNGA S, HIJAR M, NORTON R. Road-traffic injuries: Confronting disparities toaddress a global-health problem[J]. The Lancet, 2006, 367(9521): 1533-1540.
[7] RAMÍREZ A F, VALENCIA C. Spatiotemporal correlation study of traffic accidents with fatalities and injuries in Bogota (Colombia)[J]. Accident Analysis & Prevention, 2021, 149: 105848.
[8] GISSANE W. Accidents—a modern epidemic[J]. Journal of the Institute of Health Education,1965, 3(1): 16-18.
[9] ZONG F, XU H, ZHANG H. Prediction for traffic accident severity: comparing the Bayesiannetwork and regression models[J]. Mathematical Problems in Engineering, 2013, 2013.
[10] ZAJAC S S, IVAN J N. Factors influencing injury severity of motor vehicle–crossing pedestriancrashes in rural Connecticut[J]. Accident Analysis & Prevention, 2003, 35(3): 369-379.
[11] ZHAO L, WANG C, YANG H, et al. Exploring injury severity of non-motor vehicle riders involving in traffic accidents using the generalized ordered logit model[J]. Ain Shams engineeringjournal, 2023, 14(5): 101962.
[12] HALEEM K, ALLURI P, GAN A. Analyzing pedestrian crash injury severity at signalized andnon-signalized locations[J]. Accident Analysis & Prevention, 2015, 81: 14-23.
[13] CHEN W H, JOVANIS P P. Method for identifying factors contributing to driver-injury severityin traffic crashes[J]. Transportation Research Record, 2000, 1717(1): 1-9.
[14] SARKAR S, VINAY S, RAJ R, et al. Application of optimized machine learning techniques forprediction of occupational accidents[J]. Computers & Operations Research, 2019, 106: 210-224.
[15] ZOU Y, LIN B, YANG X, et al. Application of the bayesian model averaging in analyzingfreeway traffic incident clearance time for emergency management[J]. Journal of advancedtransportation, 2021, 2021: 1-9.
[16] LITMAN T. Integrating public health objectives in transportation decisionmaking[J]. American Journal of Health Promotion, 2003, 18(1): 103-108.
[17] ROMANO E, FELL J C, LI K, et al. Alcohol-and speeding-related fatal crashes among novicedrivers age 18–20 not fully licensed at the time of the crash[J]. Drug and alcohol dependence,2021, 218: 108417.
[18] MONTELLA A, ARIA M, D’AMBROSIO A, et al. Analysis of powered two-wheeler crashesin Italy by classification trees and rules discovery[J]. Accident Analysis & Prevention, 2012,49: 58-72.
[19] VILACA M, MACEDO E, COELHO M C. A rare event modelling approach to assess injuryseverity risk of vulnerable road users[J]. Safety, 2019, 5(2): 29.
[20] KONONEN D, FLANNAGAN C A, WANG S C. Identification and validation of a logisticregression model for predicting serious injuries associated with motor vehicle crashes[J]. Accident Analysis & Prevention, 2011, 43(1): 112-122.
[21] HU W, DONNELL E T. Severity models of cross-median and rollover crashes on rural dividedhighways in Pennsylvania[J]. Journal of Safety Research, 2011, 42: 375-382.
[22] SHARMA B, KATIYAR V K, KUMAR K. Traffic Accident Prediction Model Using SupportVector Machines with Gaussian Kernel[C]//Advances in Intelligent Systems and Computing:volume 437. 2016: 1-10.
[23] LI Z, LIU P, WANG W, et al. Using support vector machine models for crash injury severityanalysis[J]. Accident Analysis & Prevention, 2012, 45: 478-486.
[24] XIAO J. SVM and KNN ensemble learning for traffic incident detection[J]. Physica A: StatisticalMechanics and its Applications, 2019, 517: 29-35.
[25] ZHU S, MENG Q. What can we learn from autonomous vehicle collision data on crash severity?A cost-sensitive CART approach[J]. Accident Analysis & Prevention, 2022, 174: 106769.
[26] CHANG L Y, CHIEN S I J. Analysis of driver injury severity in truck-involved accidents usinga non-parametric classification tree model[J]. Safety science, 2013, 51(1): 17-22.
[27] LALIKA L, KITALI A E, HAULE H J, et al. What are the leading causes of fatal and severeinjury crashes involving older pedestrian? Evidence from Bayesian network model[J]. Journalof safety research, 2022, 80: 281-292.
[28] YASSIN S S. Road accident prediction and model interpretation using a hybrid K-means andrandom forest algorithm approach[J]. SN Applied Sciences, 2020, 2(9): 1576.
[29] KOPELIAS P, PAPADIMITRIOU E, PAPANDREOU K, et al. Urban freeway crash analysis:Geometric, operational, and weather effects on crash number and severity[J]. TransportationResearch Record, 2007, 2015(1): 123-131.
[30] MUJALLI R O, LÓPEZ G, GARACH L. Bayes classifiers for imbalanced traffic accidentsdatasets[J]. Accident Analysis & Prevention, 2016, 88: 37-51.
[31] GAO L, LU P, REN Y. A deep learning approach for imbalanced crash data in predictinghighway-rail grade crossings accidents[J]. Reliability Engineering & System Safety, 2021, 216:108019.
[32] LEEVY J L, KHOSHGOFTAAR T M, BAUDER R A, et al. A survey on addressing high-classimbalance in big data[J]. Journal of Big Data, 2018, 5(1): 42.
[33] HE H, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledgeand Data Engineering, 2009, 21(9): 1263-1284.
[34] ZHANG C, TAN K C, LI H, et al. A cost-sensitive deep belief network for imbalanced classification[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 99: 1-14.
[35] YEN S J, LEE Y S. Under-sampling approaches for improving prediction of the minority classin an imbalanced dataset[C]//Intelligent Control and Automation. Springer, 2006: 731-740.
[36] HAN H, WANG W Y, MAO B H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning[J]. International Conference on Intelligent, 2005, 3644(1): 878-887.
[37] TANTITHAMTHAVORN C, HASSAN A E, MATSUMOTO K. The impact of class rebalancing techniques on the performance and interpretation of defect prediction models[J]. IEEETransactions on Software Engineering, 2018, 46(11): 1200-1219.
[38] CHAWLA N V, BOWYER K W, HALL L O, et al. SMOTE: synthetic minority over-samplingtechnique[J]. Journal of artificial intelligence research, 2002, 16: 321-357.
[39] LIN W C, TSAI C F, HU Y H, et al. Clustering-based undersampling in class-imbalanced data[J]. Information Sciences, 2017, 409: 17-26.
[40] SÁEZ J A, LUENGO J, STEFANOWSKI J, et al. SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a resampling method with filtering[J].Information Sciences, 2015, 291: 184-203.
[41] MENG D, LI Y. An imbalanced learning method by combining SMOTE with Center OffsetFactor[J]. Applied Soft Computing, 2022, 120: 108618.
[42] HE H, BAI Y, GARCIA E A, et al. ADASYN: adaptive synthetic sampling approach for imbalanced learning[C]//IEEE International Joint Conference on Neural Networks. 2008: 1322-1328.
[43] BARUA S, ISLAM M M, YAO X, et al. MWMOTE-majority weighted minority oversamplingtechnique for imbalanced data set learning[J]. IEEE Transactions on Knowledge and Data Engineering, 2014, 26(2): 405-425.
[44] KAMAL B, LI T, CHUBATO W Y, et al. Enhancing software defect prediction usingsupervised-learning based framework[C]//International Conference on Intelligent Systems andKnowledge Engineering (ISKE). 2017.
[45] MATHEW J, LUO M, PANG C K, et al. Kernel-based SMOTE for SVM classification ofimbalanced datasets[C]//Industrial Electronics Society. IECON 2015- 41st Annual Conferenceof the IEEE. 2015.
[46] SHIOMI Y, TORIUMI A, NAKAMURA H. International analysis on social and personal determinants of traffic violations and accidents employing logistic regression with elastic net regularization[J]. IATSS research, 2022, 46(1): 36-45.
[47] VALENTI G, LELLI M, CUCINA D. A comparative study of models for the incident durationprediction[J]. European Transport Research Review, 2010, 2(2): 103-111.
[48] KHATTAK A, WANG X, ZHANG H. iMiT: A tool for dynamically predicting incident durations, secondary incident occurrence, and incident delays[C]//TRB 89th Annual Meeting Compendium of Papers DVD. 2011.
[49] ISLAM S M, WASHINGTON S, KIM J, et al. A hierarchical multinomial logit model to examine the effects of signal strategies on right-turn crash risks by crash movement configuration[J]. Accident Analysis & Prevention, 2023, 184: 106993.
[50] ABDEL-ATY M, UDDIN N, ABDALLA F, et al. Predicting freeway crashes from loop detectordata using matched-case-control logistic regression[J]. Transportation Research Record, 2004,1897(1): 88-95.
[51] AL-GHAMDI A S. Using logistic regression to estimate the influence of accident factors onaccident severity[J]. Accident Analysis & Prevention, 2002, 34: 729-741.
[52] ABBAS K A. Traffic safety assessment and development of predictive models for accidents onrural roads in Egypt[J]. Accident Analysis & Prevention, 2004, 36(2): 149-163.
[53] CELIK A K, OKTAY E. A multinomial logit analysis of risk factors influencing road traffic injury severities in the Erzurum and Kars Provinces of Turkey[J]. Accident Analysis & Prevention,2014, 72: 66-77.
[54] ZENG Q, WANG F, CHEN T, et al. Incorporating real-time weather conditions into analyzing clearance time of freeway accidents: A grouped random parameters hazard-based duration model with time-varying covariates[J]. Analytic Methods in Accident Research, 2023, 38:100267.
[55] CHUNG Y. Development of an accident duration prediction model on the Korean freewaysystems[J]. Accident Analysis & Prevention, 2010, 42(1): 282-289.
[56] WANG H, LIU Z, WANG X, et al. Analysis of the injury-severity outcomes of maritime accidents using a zero-inflated ordered probit model[J]. Ocean Engineering, 2022, 258: 111796.
[57] YAMAMOTO T, SHANKAR V N. Bivariate ordered-response probit model of driver’s andpassenger’s injury severities in collisions with fixed objects[J]. Accident Analysis & Prevention,2004, 36: 869-876.
[58] CHANG L, WANG H. Analysis of traffic injury severity: an application of non-parametricclassification tree techniques[J]. Accident Analysis & Prevention, 2006, 38: 1019-1027.
[59] DE ONA J, MUJALLI R O, CALVO F J. Analysis of traffic accident injury severity on Spanishrural highways using Bayesian networks[J]. Accident Analysis & Prevention, 2011, 43(1): 402-411.
[60] ALKHEDER S, ALRUKAIBI F, AIASH A. Risk analysis of traffic accidents’ severities: Anapplication of three data mining models[J]. ISA transactions, 2020, 106: 213-220.
[61] SEVGILI C, FISKIN R, CAKIR E. A data-driven Bayesian network model for oil spill occurrence prediction using tankship accidents[J]. Journal of Cleaner Production, 2022, 370: 133478.
[62] GREGORIADES A. Towards a user-centred road safety management method based on roadtraffic simulation[C]//Proceedings of the 39th Conference on Winter Simulation. 2007: 1905-1914.
[63] ABDELWAHAB H, ABDEL-ATY M. Development of Artificial Neural Network to PredictDriver Injury Severity in Traffic Accident at Signalized Intersections[J]. Transportation Research Record, 2001, 1746: 6-13.
[64] DELEN D, SHARDA R, BESSONOV M. Identifying significant predictors of injury severity in traffic accidents using a series of artificial neural networks[J/OL]. Accident Analysis &Prevention, 2006, 38(3): 434-444. http://dx.doi.org/10.1016/j.aap.
[65] MA X, DING C, LUAN S, et al. Prioritizing influential factors for freeway incident clearancetime prediction using the gradient boosting decision trees method[J]. IEEE Transactions onIntelligent Transportation Systems, 2017, 18(9): 2303-2310.
[66] Traffic volume survey vehicle classification and vehicle conversion coefficient[EB/OL]. Accessed 2023-03-02. https://baike.baidu.com/item/%E8%BD%A6%E5%9E%8B%E5%88%86%E7%B1%BB%E5%8F%8A%E6%8A%98%E7%AE%97%E7%B3%BB%E6%95%B0/6320009?fr=aladdin.
[67] STEFANOWSKI J, WILK S. Selective pre-processing of imbalanced data for improving classification performance[C]//Proceedings of 10th International Conference DaWaK 2008: volume5182. Springer, 2008: 283-292.
[68] AKAIKE H. A new look at the statistical model identification[J]. IEEE Transactions on Automatic Control, 1974, 19: 716-723.
[69] NELDER J, WEDDERBURN R. Generalized Linear Models[J]. Journal of the Royal StatisticalSociety. Series A (General), 1972, 135(3): 370-384.
[70] SHAO J. Mathematical Statistics[M]. Springer, New York, NY, 2003.
[71] LAM K L, CHENG W Y, SU Y, et al. Use of random forest analysis to quantify the importance ofthe structural characteristics of beta-glucans for prebiotic development[J]. Food Hydrocolloids,2020, 108: 1-13.
[72] VERIKAS A, GELZINIS A, BACAUSKIENE M. Mining data with random forest: A surveyand results of new tests[J]. Pattern Recognition, 2011, 44: 330-349.
[73] STROBL C, BOULESTEIX A L, ZEILEIS A, et al. Bias in random forest variable importancemeasures: Illustrations, sources and a solution[J]. BMC Bioinformatics, 2007, 8: 25.
[74] STROBL C, BOULESTEIX A L, KNEIB T, et al. Conditional variable importance for randomforests[J]. BMC Bioinformatics, 2008, 9: 307.
[75] OBEREIGNER G, TKACHENKO P, DEL RE L. Methods for Traffic Data Classification withregard to Potential Safety Hazards[J]. IFAC-PapersOnLine, 2021, 54(7): 250-255.
[76] NEAPOLITAN R E. Probabilistic Methods for Bioinformatics[M]. Morgan Kaufmann Publishers, 2009.
[77] HECKERMAN D, GEIGER D, CHICKERING D M. Learning Bayesian networks: The combination of knowledge and statistical data[J]. Machine learning, 1995, 20(3): 197-243.
[78] COOPER G F, HERSKOVITS E. A Bayesian method for the induction of probabilistic networksfrom data[J]. Machine learning, 1992, 9(4): 309-347.
[79] BUNTINE W. Theory refinement on Bayesian networks[C]//Seventh Annual Conference onUncertainty in AI (UAI). Morgan Kaufmann Publishers, 1991: 52-60.
[80] RISSANEN J. Modeling by shortest data description[J]. Automatica, 1978, 14(5): 465-471.
[81] LAM W, BACCHUS F. Learning Bayesian belief networks: An approach based on the MDLprinciple[J]. Computational intelligence, 1994, 10(3): 269-293.
[82] AKAIKE H. Information theory and an extension of the maximum likelihood principle[J].Selected papers of Hirotugu Akaike, 1998: 199-213.
[83] CAMPOS L M G. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests[J]. Journal of Machine Learning Research, 2006, 7:2149-2187.
[84] CHICKERING D M. A transformational characterization of equivalent Bayesian network structures[C]//Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers, Inc., 1996: 87-98.
[85] 袁非牛, 章琳, 史劲亭, 等. 自编码神经网络理论及应用综述[J]. 计算机学报, 2019, 42(01):203-230.
[86] 张良均, 曹晶, 蒋世忠. 神经网络实用教程[M]. 北京-机械工业出版社, 2008.
[87] BIE X. Application research on BP neural network[J]. Smart Factory, 2016.
[88] ZHANG H, ZHANG Y, KHATTAK A. Analysis of Large-Scale Incidents on Urban Freeways[J]. Transportation Research Record, 2012, 2278(1): 74-84.
[89] THAMMASIRI D, DELEN D, MEESAD P, et al. A critical assessment of imbalanced classdistribution problem: The case of predicting freshmen student attrition[J]. Expert Systems withApplications, 2014, 41(2): 321-330.
[90] DING C, MA X, WANG Y, et al. Exploring the influential factors in incident clearance time:Disentangling causation from self-selection bias[J/OL]. Accident Analysis & Prevention, 2015,85: 58-65. DOI: 10.1016/j.aap.2015.08.024.
[91] LI R, GUO M, LU H. Analysis of the Different Duration Stages of Accidents with HazardBased Model[J]. International Journal of Intelligent Transportation Systems Research, 2017,15(1): 7-16.

所在学位评定分委会
数学
国内图书分类号
O212.1
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/544571
专题理学院_统计与数据科学系
推荐引用方式
GB/T 7714
Zhang AY. MULTI-MODEL TRAFFIC ACCIDENT SEVERITY CLASSIFICATION FRAMEWORK[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12132911-张安仪-统计与数据科学(2034KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[张安仪]的文章
百度学术
百度学术中相似的文章
[张安仪]的文章
必应学术
必应学术中相似的文章
[张安仪]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。