中文版 | English
题名

Distributed statistical inference of weighted statistics under heterogeneity

其他题名
异质性下分布式加权统计量的统计推断
姓名
姓名拼音
MA Fengli
学号
12232871
学位类型
硕士
学位专业
0701 数学
学科门类/专业学位类别
07 理学
导师
邵启满
导师单位
统计与数据科学系
论文答辩日期
2024-05-20
论文提交日期
2024-07-01
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

We consider the asymptotic properties of a distributed statistic based on selfnormalized variance minimization under heteroscedasticity. Due to different variances of random variables across blocks, we incorporate variance information by weighting the estimates obtained from each block. Without self-normalization, according to the classical Berry-Esseen inequality, when the absolute third moment exists, the distributed statistic can be regarded as a sum of independent random variables converging to the standard normal distribution at a rate of $O(1 / \sqrt{N})$. However, in the case of unknown variances, we need to use estimations of the population variances instead, leading to strong dependence between the weights and the random variables.

In this thesis, we employ concentration inequalities proposed by Shao and Zhou in 2016 to study self-normalized distributed statistics, under 6th moment condition, resulting in a convergence rate of $O(K / \sqrt{N})$. Through simulation experiments and theoretical analysis, we observe that, with a given $N$, the expectation of the self-normalized distributed statistic exhibits bias when the number of blocks $K$ is large, ancd this bias increases with $K$ until it fails to converge to the standard normal distribution. In simulation study, we observe that removing this bias from the self-normalized distributed statistic when $K$ is large allows it to still converge to a normal distribution. To address this problem, we propose a debiased self-normalized distributed statistic, demonstrating its optimal convergence rate and expanding the range of feasible $K$.

关键词
语种
英语
培养类别
独立培养
入学年份
2022
学位授予年份
2024-06
参考文献列表

[1] CHEN S X, PENG L. Distributed Statistical Inference for Massive Data[J]. The Annals of Statistics, 2021, 49(5): 2851-2869.
[2] GU J, CHEN S X. Distributed Statistical Inference under Heterogeneity[J]. Journal of Machine Learning Research, 2023, 24(387): 1-57.
[3] DUAN R, NING Y, CHEN Y. Heterogeneity–aware and Communication–efficient Distributed Statistical Inference[J]. Biometrika, 2022, 109(1): 67-83.
[4] ZHAO T, CHENG G, LIU H. A Partially Linear Framework for Massive Heterogeneous Data [J]. The Annals of Statistics, 2016, 44(4): 1400.
[5] CAI T T, WEI H. Distributed Adaptive Gaussian Mean Estimation with Unknown Variance:Interactive Protocol Helps Adaptation[J]. The Annals of Statistics, 2022, 50(4): 1992-2020.
[6] FISCHER H. A History of the Central Limit Theorem: from Classical to Modern Probability Theory: volume 4[M]. Springer, 2011.
[7] LAPLACE P S. Sur les Approximations des Formules qui sont Fonctions de tres Grands Nombres et Sur leur Application aux Probabilites[J]. Œuvres complètes, 1810, 12: 301-345.
[8] MARQUIS DE LAPLACE P S. Théorie Analytique des Probabilités: volume 7[M]. Courcier, 1820.
[9] LIAPOUNOFF A. Sur une Proposition de la théorie des Probabilités[J]. Известия Российской академии наук. Серия математическая, 1900, 13(4): 359-386.
[10] LINDEBERG J W. Über das Gauss’sche Fehlergesetz[J]. Skandinavisk Aktuarietidskrift, 1922, 5: 217-234.
[11] LINDEBERG J W. Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung[J]. Mathematische Zeitschrift, 1922, 15(1): 211-225.
[12] BROWN B M. Martingale Central Limit Theorems[J]. The Annals of Mathematical Statistics, 1971: 59-66.
[13] ROMANO J P, WOLF M. A More General Central Limit Theorem for M-dependent Random Variables with Unbounded[J]. Statistics & Probability Letters, 2000, 47(2): 115-124.
[14] STEIN C. A Bound for the Error in the Normal Approximation to the Distribution of a Sum of Dependent Random Variables[C]//Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability, Volume 2: volume 6. University of California Press, 1972: 583-602.
[15] SHAO Q M, ZHANG K, ZHOU W X. Stein’s method for nonlinear statistics: A brief survey and recent progress[J]. Journal of Statistical Planning and Inference, 2016, 168: 68-89.
[16] BERRY A C. The Accuracy of the Gaussian Approximation to the Sum of Independent Variates [J]. Transactions of the American Mathematical Society, 1941, 49(1): 122-136.
[17] ESSEEN C G. On the Liapunoff Limit of Error in the Theory of Probability[J]. Arkiv för Matematik, Astronomi och Fysik, 1942, A28: 1-19.
[18] DIACONIS P. The Distribution of Leading Digits and Uniform Distribution Mod 1[J]. The Annals of Probability, 1977, 5(1): 72-81.
[19] BALDI P, RINOTT Y, STEIN C. A Normal Approximation for the Number of Local Maxima of a Random Function on a Graph[M]//Probability, statistics, and mathematics. Elsevier, 1989: 59-81.
[20] CHEN L H, SHAO Q M. A Non-uniform Berry–Esseen Bound via Stein’s Method[J]. Probability Theory and Related Fields, 2001, 120: 236-254.
[21] CHEN L H, SHAO Q M. Normal Approximation under Local Dependence[J]. The Annals of Probability, 2004, 32(3): 1985-2028.
[22] CHEN L H, SHAO Q M. Normal Approximation for Nonlinear Statistics Using a Concentration Inequality approach[J]. Bernoulli, 2007, 13(2): 581-599.
[23] CHEN L H, GOLDSTEIN L, SHAO Q M. Normal Approximation by Stein’s Method: Volume 2[M]. Springer, 2011.
[24] SHAO Q M. Stein’s method, Self-normalized Limit Theory and Applications[C]//Proceedings of the International Congress of Mathematicians: IV. New Delhi: Hindustan Book Agency, 2010: 2325-2350.
[25] SHAO Q M, ZHOU W X. Cramér Type Moderate Deviation Theorems for Self-normalized Processes[J]. Bernoulli, 2016, 22(4): 2029-2079.
[26] WANG Q, JING B Y, ZHAO L. The Berry–Esseen Bound for Studentized Statistics[J]. The Annals of Probability, 2000, 28(1): 511-535.
[27] GAO L, SHAO Q M, SHI J. Refined Cramér-type Moderate Deviation Theorems for General Self-normalized Sums with Applications to Dependent Random Variables and Winsorized mean [J]. The Annals of Statistics, 2022, 50(2): 673-697.
[28] CRAMÉR H. Sur un nouveau theoreme-limite de la theorie des probabilities[J]. Scientifiques et Industrielles, 1938, 736: 5-23.
[29] JING B Y, SHAO Q M, WANG Q. Self-normalized Cramér-type Large Deviations for Independent Random Variables[J]. The Annals of Probability, 2003, 31(4): 2167-2215.
[30] WANG Q. Refined Self-normalized Large Deviations for Independent Random Variables[J].Journal of Theoretical Probability, 2011, 24(2): 307-329.
[31] ZHANG Y, WAINWRIGHT M J, DUCHI J C. Communication-efficient algorithms for statistical optimization[J]. Advances in Neural Information Processing Systems, 2012, 25.
[32] CAI T T, WANG Y, ZHANG L. The Cost of Privacy: Optimal Rates of Convergence for Parameter Estimation with Differential Privacy[J]. The Annals of Statistics, 2021, 49(5): 28252850.
[33] WU S, HUANG D, WANG H. Quasi-Newton Updating for Large-scale Distributed Learning [J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 2023, 85(4): 1326-1354.
[34] PAN R, REN T, GUO B, et al. A Note on Distributed Quantile Regression by Pilot Sampling and One-Step Updating[J]. Journal of Business & Economic Statistics, 2022, 40(4): 1691-1700.
[35] YU Y, CHAO S K, CHENG G. Distributed bootstrap for simultaneous inference under high dimensionality[J]. Journal of Machine Learning Research, 2022, 23(195): 1-77.
[36] CAI T T, WEI H. Distributed Gaussian Mean Estimation under Communication Constraints:Optimal Rates and Communication-Efficient Algorithms[J]. Journal of Machine Learning Research, 2024, 25(37): 1-63.
[37] CAI T T, WEI H. Distributed Nonparametric Function Estimation: Optimal Rate of Convergence and Cost of Adaptation[J]. The Annals of Statistics, 2022, 50(2): 698-725.
[38] LI H, LINDSAY B, WATERMAN R. Efficiency of Projected Score Methods in Rectangular Array Asymptotics[J]. Journal of the Royal Statistical Society Series B, 2003, 65(2): 191-208.
[39] SHAO Q M. An Explicit Berry–Esseen Bound for Student’s T-statistic via Stein’s Method[M]// Lect. Notes Ser. Inst. Math. Sci. Natl. Univ. Singap.: number 5 Stein’s Method and Applications. Singapore Univ. Press, 2005: 143-155.
[40] LEUNG D, SHAO Q M. Nonuniform Berry–Esseen Bounds for Studentized U-statistics[A].2024. arXiv: 2303.08619.
[41] VON BAHR B, ESSEEN C G. Inequalities for the rth Absolute Moment of a Sum of Random Variables, 1≤ r≤ 2[J]. The Annals of Mathematical Statistics, 1965: 299-303.
[42] CHEN L H, GOLDSTEIN L, SHAO Q M. Normal Approximation by Stein’s Method: volume 2 [M]. Springer Berlin, Heidelberg, 2011.
[43] HOEFFIDING W. A Class of Statistics with Asymptotically Normal Distributions[J]. The Annals of Mathematical Statistics, 1948, 19(3): 293-325.
[44] CHANG J, SHAO Q M, ZHOU W X. Cramér-type Moderate Deviations for Studentized Twosample 𝑈-statistics with Applications[J]. The Annals of Statistics, 2016, 44(5): 1931 - 1956.
[45] HASEENA A, SUVINTHRA M, MOHAN M T, et al. Moderate Deviations for Stochastic Tidal Dynamics Equations with Multiplicative Gaussian Noise[J]. Applicable Analysis, 2022, 101(4): 1456-1490.
[46] SHAO Q M, ZHANG Z S. Berry–Esseen Bounds for Multivariate Nonlinear Statistics with Applications to M-estimators and Stochastic Gradient Descent Algorithms[J]. Bernoulli, 2022, 28(3): 1548-1576.
[47] DE LA PEÑA V H, LAI T L, SHAO Q M. Cramér-Type Moderate Deviations for SelfNormalized Sums[M]. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009: 87-106.
[48] BERNSTEIN S. On a Modification of Chebyshev’s Inequality and of the Error Formula of Laplace[J]. Annals Science Institute SAV. Ukraine, Sect. Math, 1924, 1(4): 38-49.
[49] BENTKUS V, JING B Y, SHAO Q M, et al. Limiting Distributions of the Non-central 𝑇 -statistic and Their Applications to the Power of 𝑇 -tests under Non-normality[J]. Bernoulli, 2007, 13(2): 346 - 364.
[50] GAO Y, LIU W, WANG H, et al. A Review of Distributed Statistical Inference[J]. Statistical Theory and Related Fields, 2022, 6(2): 89-99.
[51] CHEBYSHEV P L. Des valeurs moyennes[J]. Journal de Mathématiques Pures et Appliquées, 1867, 12: 177-184.
[52] LÉVY P. Théorie des Erreurs. La loi de Gauss et les lois exceptionnelles[J]. Bulletin de la Societé mathématique de France, 1924, 52: 49-85.

所在学位评定分委会
数学
国内图书分类号
O211.4
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/778739
专题理学院_统计与数据科学系
推荐引用方式
GB/T 7714
Ma FL. Distributed statistical inference of weighted statistics under heterogeneity[D]. 深圳. 南方科技大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12232871-马锋利-统计与数据科学(5884KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[马锋利]的文章
百度学术
百度学术中相似的文章
[马锋利]的文章
必应学术
必应学术中相似的文章
[马锋利]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。