中文版 | English
题名

TESTS OF HIGH-DIMENSIONAL COMPOSITIONAL DATA

其他题名
高维成分数据的检验
姓名
姓名拼音
LI Wenbo
学号
12232887
学位类型
硕士
学位专业
0701 数学
学科门类/专业学位类别
07 理学
导师
李曾
导师单位
统计与数据科学系
论文答辩日期
2024-05-20
论文提交日期
2024-07-04
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

The human microbiome plays a crucial role in human health and diseases. How to comprehensively analyze microbiome data and explore the relationship between microor- ganisms and human health is a hot topic. High-dimensional compositional data are an im- portant type of microbiome data. By performing statistical inference on high-dimensional compositional data, we can profoundly investigate the potential relationships between human microbiome data and health and diseases. This paper focuses on the hypothesis testing problems of high-dimensional compositional data, including one-sample and two- sample mean and covariance testing. We propose brand new testing methods that address the issue of existing test statistics not being applicable to the sum-to-one constraint of high-dimensional compositional data.

For mean testing, the existing max-type test statistics are mainly designed for sparse high-dimensional compositional data. However, for data with weak signals and high den- sity, the test power significantly decreases. Sum-type test statistics are more suitable for dense data testing, but most of the existing sum-type test statistics are based on the as- sumption of independent component models and are not applicable to high-dimensional compositional data. Therefore, we modify the existing sum-type tests to make them ap- plicable to one-sample and two-sample mean testing of high-dimensional compositional data under more general conditions. Furthermore, we demonstrat the asymptotic inde- pendence of the sum-type test statistic and the max-type test statistic, and subsequently propose a max-sum combination test statistic that can handle both sparse and dense data. We establish the asymptotic distribution of these test statistics under the null hypothesis and the power analysis under the alternative hypothesis. Both theoretical derivations and numerical simulations indicate that the proposed max-sum test performs robustly regard- less of the sparsity of the data.

Then we consider the spherical test for the covariance matrix of high-dimensional compositional data. We adopt the classical multivariate analysis method, John’s test statis- tic, and modify it to be applicable to high-dimensional compositional data. To derive the asymptotic distribution of the modified John’s test statistic, we generalize the central limit theorem for the sample covariance matrix linear spectral statistic of independent compo- nent data, making it also applicable to cases with a degenerate population covariance matrix, including high-dimensional compositional data. Meanwhile, numerical simulations also show that our modified John’s test statistic maintains a good power while controlling the empirical test size.

关键词
语种
英语
培养类别
独立培养
入学年份
2022
学位授予年份
2024-06
参考文献列表

[1] AITCHISON J W. The statistical analysis of compositional data[M]. Caldwell: Blackburn Press, 2003.
[2] CAO Y, LIN W, LI H. Two-sample tests of high-dimensional means for compositional data[J]. Biometrika, 2018, 105: 115-132.
[3] CAI T T, LIU W, XIA Y. Two-sample test of high dimensional means under dependence[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2014, 76: 349-372.
[4] HOTELLING H. The Generalization of student’s ratio[J]. The Annals of Mathematical Statis- tics, 1931, 2(3): 360-378.
[5] DEMPSTER A P. A high dimensional two sample significance test[J]. The Annals of Mathe- matical Statistics, 1958, 29(4): 995-1010.
[6] DEMPSTER A P. A significance test for the separation of two highly multivariate small samples [J]. Biometrics, 1960, 16(1): 41-50.
[7] LAUTER J. Exact t and F tests for analyzing studies with multiple endpoints[J]. Biometrics, 1996, 52(3): 964-970.
[8] CHEN L S, PAUL D, PRENTICE R, et al. A regularized hotelling’s T2 test for pathway analysis in proteomic studies[J]. Journal of the American Statistical Association, 2011, 106: 1345-1360.
[9] LI H, AUE A, PAUL D, et al. An adaptable generalization of Hotelling’s T2 test in high dimen- sion[J]. Annals of Statistics, 2020, 48(3): 1815-1847.
[10] LI H, AUE A, PAUL D. High-dimensional general linear hypothesis tests via non-linear spectral shrinkage[J]. Bernoulli, 2020, 26(4): 2541-2571.
[11] BAI Z, SARANADASA H. Effect of high dimension: by an example of a two sample problem [J]. Statistica Sinica, 1996, 6(2): 311-329.
[12] CHEN S, QIN Y. A two-sample test for high-dimensional data with applications to gene-set testing[J]. Annals of Statistics, 2010, 38: 808-835.
[13] CHEN S X, LI J, ZHONG P S. Two-sample and ANOVA tests for high dimensional means[J]. The Annals of Statistics, 2019, 47(3): 1443-1474.
[14] SRIVASTAVA M S, DU M. A test for the mean vector with fewer observations than the dimen- sion[J]. Journal of Multivariate Analysis, 2008, 99: 386-402.
[15] SRIVASTAVA M S. A test for the mean vector with fewer observations than the dimension under non-normality[J]. Journal of Multivariate Analysis, 2009, 100: 518-532.
[16] LAN WANG B P, LI R. A high-dimensional nonparametric multivariate test for mean vector [J]. Journal of the American Statistical Association, 2015, 110(512): 1658-1669.
[17] ZHANG J T, GUO J, ZHOU B, et al. A simple two-sample test in high dimensions based on L2-norm[J]. Journal of the American Statistical Association, 2020, 115(530): 1011-1027.
[18] HU J, BAI Z. A review of 20 years of naive tests of significance for high-dimensional mean vectors and covariance matrices[J]. Science China Mathematics, 2016, 59: 2281-2300.
[19] HUANG Y, LI C, LI R, et al. An overview of tests on high-dimensional means[J]. Journal of Multivariate Analysis, 2022, 188: 104813.
[20] FENG L, JIANG T, LI X N, et al. Asymptotic independence of the sum and maximum of dependent random variables with applications to high-dimensional tests[J]. Statistica Sinica, 2022.
[21] FAUST K, SATHIRAPONGSASUTI J F, IZARD J, et al. Microbial co-occurrence relationships in the human microbiome[J]. PLoS Computational Biology, 2012, 8(7): e1002606.
[22] BENJAMINI Y, YEKUTIELI D. The Control of the false discovery rate in multiple testing under dependency[J]. Annals of Statistics, 2001, 29: 1165-1188.
[23] WU CHANGJING D M, HE Shun. High-dimensional count and compositional data analysis in microbiome studies[J]. SCIENTIA SINICA Mathematica, 2017, 47(12): 1735-1760.
[24] VOGLER A. An introduction to multivariate statistical analysis[J]. Technometrics, 2004, 46: 119-119.
[25] WANG Q, YAO J. On the sphericity test with large-dimensional observations[J]. Electronic Journal of Statistics, 2013, 7: 2164-2192.
[26] BAI Z D, SILVERSTEIN J W. CLT for linear spectral statistics of large-dimensional sample covariance matrices[J]. The Annals of Probability, 2004, 32(1A): 553-605.
[27] PAN G M, ZHOU W. Central limit theorem for signal-to-interference ratio of reduced rank linear receiver[J]. The Annals of Applied Probability, 2008, 18(3): 1232-1270.
[28] LEDOIT O, WOLF M. Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size[J]. Quality Engineering, 2002, 48: 369-370.
[29] LI J, CHEN S X. Two sample tests for high-dimensional covariance matrices[J]. The Annals of Statistics, 2012, 40(2): 908-940.
[30] AITCHISON J W. The statistical analysis of compositional data (with Discussion)[J]. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1982, 44: 139-77.
[31] WANG Q, SU Z, YAO J. Joint CLT for several random sesquilinear forms with applications to large-dimensional spiked population models[J]. Electronic Journal of Probability, 2014, 19: 1-28.
[32] JOHN S. Some optimal multivariate tests[J]. Biometrika, 1971, 58: 123-127.
[33] BAI Z, YAO J. On the convergence of the spectral empirical process of Wigner matrices[J]. Bernoulli, 2005, 11(6): 1059-1092.
[34] ZHENG S. Central limit theorems for linear spectral statistics of large dimensional F-matrices [J/OL]. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 2012, 48(2): 444-476. https://doi.org/10.1214/11-AIHP414.
[35] YANG Y, PAN G. Independence test for high dimensional data based on regularized canonical correlation coefficients[J]. The Annals of Statistics, 2015, 43(2): 467-500.
[36] GAO J, HAN X, PAN G, et al. High dimensional correlation matrices: the central limit theorem and its applications[J]. Journal of the Royal Statistical Society. Series B. Statistical Methodol- ogy, 2017, 79(3): 677-693.
[37] LI Z, WANG Q, LI R. Central limit theorem for linear spectral statistics of large dimensional Kendall’s rank correlation matrices and its applications[J]. The Annals of Statistics, 2021, 49 (3): 1569-1593.
[38] SILVERSTEIN J W, CHOI S I. Analysis of the Limiting Spectral Distribution of Large Dimen- sional Random Matrices[J]. Journal of Multivariate Analysis, 1995, 54(2): 295-309.
[39] ZOU T, RONG ZHENG S, BAI Z, et al. CLT for linear spectral statistics of large dimensional sample covariance matrices with dependent data[J]. Statistical Papers, 2021, 63: 605-664.

所在学位评定分委会
数学
国内图书分类号
O212.1
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/778956
专题理学院_统计与数据科学系
推荐引用方式
GB/T 7714
Li WB. TESTS OF HIGH-DIMENSIONAL COMPOSITIONAL DATA[D]. 深圳. 南方科技大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12232887-李文博-统计与数据科学(2551KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[李文博]的文章
百度学术
百度学术中相似的文章
[李文博]的文章
必应学术
必应学术中相似的文章
[李文博]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。