中文版 | English
题名

Normal Approximation Error for Generalized U-statistics with Applications

其他题名
广义U-统计量的正态逼近误差及其应用
姓名
姓名拼音
LI Caiqi
学号
12132903
学位类型
硕士
学位专业
0701 数学
学科门类/专业学位类别
07 理学
导师
邵启满
导师单位
统计与数据科学系
论文答辩日期
2023-05-15
论文提交日期
2023-06-28
学位授予单位
南方科技大学
学位授予地点
深圳
摘要
The generalized U-statistic is a generalization of the U-statistic, which has a wide range of applications in practice, and is a statistic worth studying. In this thesis, we extend the form of the generalized U-statistic abstracted from the random forest and study the normal convergence rate of the generalized U-statistic after extension by stein's method.  The random forest method is an integrated learning method which have strong generalization capability consisting of multiple decision trees. The random forest method studied in this thesis requires that the composition of each decision tree and whether to construct that random tree are random. The random forest method can be used for classification or regression. When using random forest for regression analysis, the output is the average of the outputs of all decision trees, so the output can be regarded as a weighted U-statistic weighted by random variables obeying binomial distribution or an incomplete U-statistic. In our study of this type of U-statistic, we first extend its form to generalized U-statistics with weights that are general independent random variables which are greater than 0. Then we consider using stein's method for this statistic that was extended by us. In this thesis we give the optimal uniform Berry-Esseen bound with convergence rate of 𝑛-1/2 for it under the condition that the fourth moment of the weights are finite. In practice, applying properties of the statistic requires us to replace the unknown parameters in the statistic with some observable values and give some properties of the statistic after parameter replacement. Therefore, we use the Jackknife estimator of variance to construct the studentized statistic for the generalized U-statistic extended by us. In this thesis, we also make some modifications to the relevant corollaries of the moderate deviation theorem proposed by Shao and Zhou (2016) and use them to calculate the Berry-Esseen bound for the studentized generalized U-statistic. Finally, in this thesis, we use the conclusions of the generalized U-statistic to calculate for the case of random forests. And the normal convergence rates of the outputs of the random forest are given not only in the case of known variance, but also in the case of using Jackknife estimators of variance to replace the unknown variance.
其他摘要
广义 U-统计量是对 U-统计量的一种推广,在实际中具有较为广泛的应用,是非常值得研究的一种统计量。本篇论文对从随机森林方法中抽象出来的一种广义 U-统计量进行了推广,并通过 Stein's 方法对推广后的广义 U-统计量的正态收敛速率进行了研究。随机森林方法是一种由多棵决策树构成的集成学习法,具有很强的泛化能力。本文研究的随机森林方法,要求每一棵决策树的组成和是否构建该棵随机树都是随机的。随机森林方法既可以用于分类也可以用于回归,当利用随机森林进行回归分析时,其输出结果为每一棵树的输出结果的平均值,因此可以将输出结果看作以服从二项分布的随机变量为权重的加权 U-统计量或不完全 U-统计量。在对这一类型的 U-统计量进行研究时,我们首先对其进行了推广,将权重推广为大于 0 且互相独立的一般随机变量。在此基础上我们利用 Stein's 方法,在权重随机变量的四阶矩存在的情况下,给出了推广后的广义 U-统计量的一致 Berry-Esseen 界,并计算到了这一方法下的最优收敛速率𝑛-1/2。在实际中,想要应用统计量的一些性质就需要我们将统计量中的一些未知参数利用可观测到的数值进行替换,并给出替换后的统计量的一些性质。因此本文利用方差的 Jackknife 估计量为推广后的广义 U-统计量构建了学生化统计量。本文对 Shao Zhou2016)提出的中偏差定理的相关推论做了一些修改,并以此计算出了学生化后的广义 U-统计量的 Berry-Esseen 界。最后本文根据推广后广义 U-统计量的结论,对随机森林的情况进行了单独计算,给出了随机森林输出结果在已知方差情况下的正态收敛速率,以及在利用方差的 Jackknife 估计量来代替未知方差的情况下的正态收敛速率。
关键词
其他关键词
语种
英语
培养类别
独立培养
入学年份
2021
学位授予年份
2023-06
参考文献列表

[1] HOEFFIDING W. A class of statistics with asymptotically normal distributions[J]. Annals of Mathematical Statistics, 1948, 19(3): 293-325.
[2] DEHLING H, TAQQU M S. The Empirical Process of some Long-Range Dependent Sequences with an Application to U-Statistics[J]. The Annals of Statistics, 1989, 17(4): 1767-1783.
[3] CHEN L H, SHAO Q M. Normal approximation for nonlinear statistics using a concentration inequality approach[J]. Bernoulli, 2007, 13(2): 581-599.
[4] JANSON S, NOWICKI K. The asymptotic distributions of generalized U-statistics with applications to random graphs[J]. Probability theory and related fields, 1991, 90: 341-375.
[5] YU Q, TANG W, KOWALSKI J, et al. Multivariate U-statistics: a tutorial with applications[J]. Wiley Interdisciplinary Reviews: Computational Statistics, 2011, 3(5): 457-471.
[6] MENTCH L, HOOKER G. Quantifying uncertainty in random forests via confidence intervals and hypothesis tests[J]. The Journal of Machine Learning Research, 2016, 17(1): 841-881.
[7] PENG W, COLEMAN T, MENTCH L. Rates of convergence for random forests via generalized U-statistics[J]. Electronic Journal of Statistics, 2022, 16(1): 232-292.
[8] CALLAERT H, VERAVERBEKE N. The Order of the Normal Approximation for a Studentized U-Statistic[J]. The Annals of Statistics, 1981, 9(1): 194-200.
[9] HELMERS R. The Berry-Esseen bound for Studentized U-statistics[J]. Canadian Journal of Statistics, 1985, 13(1): 79-82.
[10] SHAO Q M, ZHOU W X. Cramér type moderate deviation theorems for self-normalized processes[J]. Bernoulli, 2016, 22(4): 2029-2079.
[11] ARVESEN J N. Jackknifing U-statistics[J]. The Annals of Mathematical Statistics, 1969, 40 (6): 2076-2100.
[12] SHAPIRO C P, HUBERT L. Asymptotic normality of permutation statistics derived from weighted sums of bivariate functions[J]. The Annals of Statistics, 1979, 7(4): 788-794.
[13] O’NEIL K A, REDNER R A. Asymptotic Distributions of Weighted U-Statistics of Degree 2 [J]. The Annals of Probability, 1993, 21(2): 1159-1169.
[14] HSING T, WU W B. On weighted U-statistics for stationary processes[J]. The Annals of Probability, 2004, 32(2): 1600-1631.
[15] HU H, SHAO Q M. Non-uniform Berry–Esseen Bounds for Weighted U-Statistics and Generalized L-Statistics[J]. Communications in Mathematics and Statistics, 2013, 1(3): 351-367.
[16] PRIVAULT N, SERAFIN G. Normal approximation for generalized U-statistics and weighted random graphs[J]. Stochastics, 2022, 94(3): 432-458.
[17] ZHANG Z S. Berry–Esseen bounds for generalized U-statistics[J]. Electronic Journal of Probability, 2022, 27: 1 - 36.
[18] JANSON S. The asymptotic distributions of incomplete U-statistics[J]. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 1984, 66(4): 495-505.
[19] BERTAIL P, TRESSOU J. Incomplete generalized U-statistics for food risk assessment[J]. Biometrics, 2006, 62(1): 66-74.
[20] KONG X, ZHENG W. DESIGN BASED INCOMPLETE U-STATISTICS[J/OL]. StatisticaSinica, 2021, 31(3): pp.1593-1618.
[21] CHEN X, KATO K. Randomized incomplete-statistics in high dimensions[J]. The Annals of Statistics, 2019, 47(6): 3127-3156.
[22] NASARI M M. Weighted approximations for Studentized-statistics[J]. Brazilian Journal of Probability and Statistics, 2014, 28(1): 24 - 60.
[23] CHANG J, SHAO Q M, ZHOU W X. Cramér-type moderate deviations for Studentized two-sample 𝑈-statistics with applications[J]. The Annals of Statistics, 2016, 44(5): 1931 - 1956.
[24] KHAN M A, SHAH M I, JAVED M F, et al. Application of random forest for modelling of surface water salinity[J]. Ain Shams Engineering Journal, 2022, 13(4): 101635.
[25] FAN G F, ZHANG L Z, YU M, et al. Applications of random forest in multivariable response surface for short-term load forecasting[J]. International Journal of Electrical Power & Energy Systems, 2022, 139: 108073.
[26] WEN Y, WU R, ZHOU Z, et al. A data-driven method of traffic emissions mapping with land use random forest models[J]. Applied Energy, 2022, 305: 117916.
[27] BAI J, LI Y, LI J, et al. Multinomial random forest[J]. Pattern Recognition, 2022, 122: 108331.
[28] WANG J, RAO C, GOH M, et al. Risk assessment of coronary heart disease based on cloud random forest[J]. Artificial Intelligence Review, 2023, 56(1): 203-232.
[29] CHEN L H, GOLDSTEIN L, SHAO Q M. Normal approximation by Stein’s method: volume 2 [M]. Springer Berlin, Heidelberg, 2011.
[30] CHEN L H, SHAO Q M. Normal approximation under local dependence[J]. The Annals of Probability, 2004, 32(3): 1985-2028.
[31] ALTUĞ Y, WAGNER A B. Moderate deviations in channel coding[J]. IEEE Transactions on Information Theory, 2014, 60(8): 4417-4426.
[32] JACQUIER A, PANNIER A. Large and moderate deviations for stochastic Volterra systems[J]. Stochastic Processes and their Applications, 2022, 149: 142-187.
[33] HASEENA A, SUVINTHRA M, MOHAN M T, et al. Moderate deviations for stochastic tidal dynamics equations with multiplicative Gaussian noise[J]. Applicable Analysis, 2022, 101(4): 1456-1490.
[34] DE LA PEÑA V H, LAI T L, SHAO Q M. Cramér-Type Moderate Deviations for Self-Normalized Sums[M/OL]. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009: 87-106. DOI: 10.1007/978-3-540-85636-8_7.
[35] DÖRING H, EICHELSBACHER P. Moderate deviations via cumulants[J]. Journal of Theoretical Probability, 2013, 26: 360-385.
[36] FAN X, SHAO Q M. Cramér’s moderate deviations for martingales with applications[M/OL]. arXiv, 2022. https://arxiv.org/abs/2204.02562.
[37] LAI T L, SHAO Q M, WANG Q. Cramér type moderate deviations for Studentized U-statistics [J]. ESAIM: Probability and Statistics, 2011, 15: 168-179.
[38] BLOZNELIS M, GÖTZE F. Orthogonal decomposition of finite population statistics and its applications to distributional asymptotics[J]. The Annals of Statistics, 2001, 29(3): 899-917.
[39] CHASTAING G, GAMBOA F, PRIEUR C. Generalized hoeffding-sobol decomposition for dependent variables-application to sensitivity analysis[J]. Electronic Journal of Statistics, 2012, 6: 2420-2448.
[40] SHAO Q M, ZHANG Z S. Berry–Esseen bounds for multivariate nonlinear statistics with applications to M-estimators and stochastic gradient descent algorithms[J]. Bernoulli, 2022, 28 (3): 1548-1576.
[41] VERSHYNIN R. Cambridge Series in Statistical and Probabilistic Mathematics: High-Dimensional Probability: An Introduction with Applications in Data Science[M]. Cambridge University Press, 2018.

所在学位评定分委会
数学
国内图书分类号
O211.4
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/544443
专题理学院_统计与数据科学系
推荐引用方式
GB/T 7714
Li CQ. Normal Approximation Error for Generalized U-statistics with Applications[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12132903-李才奇-统计与数据科学(2327KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[李才奇]的文章
百度学术
百度学术中相似的文章
[李才奇]的文章
必应学术
必应学术中相似的文章
[李才奇]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。