题名 | On Eigenvalue Statistics of Large Sample Covariance Matrices and Spearman Correlation Matrices |
姓名 | |
姓名拼音 | QIU Jiaxin
|
学号 | 12050012
|
学位类型 | 博士
|
学位专业 | 统计学
|
导师 | |
导师单位 | 统计与数据科学系
|
外机构导师 | 李国栋
|
外机构导师单位 | 香港大学
|
论文答辩日期 | 2024-08-20
|
论文提交日期 | 2024-09-20
|
学位授予单位 | 香港大学
|
学位授予地点 | 中国香港
|
摘要 | The Random Matrix Theory (RMT) is a mathematical framework originating in mathematical physics that explores the properties of matrices with random elements. RMT is widely used in statistics and provides valuable insights into high-dimensional statistics. This thesis explores the eigenvalue statistics of two key random matrices: the sample covariance matrix and the Spearman correlation matrix, as well as their applications in high-dimensional statistics. In the first part of this thesis, we derive the asymptotic normality for a large family of eigenvalue statistics of a general sample covariance matrix under the ultra-high dimensional setting, that is, when the dimension-to-sample-size ratio p/n grows to infinity. Based on this central limit theorem (CLT) result, we extend the covariance matrix test problem to the new ultra-high dimensional context, and apply it to test a matrix-valued white noise. Simulation experiments are conducted for the investigation of finite-sample properties of the general asymptotic normality of eigenvalue statistics, as well as the two developed tests. In the second part of the thesis, we study the spectral properties of the Spearman sample correlation matrix under the high-dimensional setting, where both dimension and sample size tend to infinity proportionally. Based on the theoretical result, we propose an estimator to determine the number of common factors in a high-dimensional factor model. This estimator is robust against heavy tails in either the common factors or idiosyncratic errors, and its consistency is established under mild assumptions. Extensive numerical experiments and real data analyses demonstrate the superiority of our estimator compared to existing methods. In summary, this thesis contributes to high-dimensional statistics by investigating the eigenvalue statistics of the sample covariance matrix and Spearman correlation matrix, addressing challenges posed by ultra-high dimensionality and heavy-tailed data distributions in real-world scenarios. |
关键词 | |
语种 | 英语
|
培养类别 | 联合培养
|
入学年份 | 2020
|
学位授予年份 | 2024-09
|
参考文献列表 | [1] Seung C. Ahn and Alex R. Horenstein. Eigenvalue ratio test for the number of factors. Econometrica, 81(3):1203–1227, 2013. doi: 10.3982/ECTA8968. [2] Lucia Alessi, Matteo Barigozzi, and Marco Capasso. Improved penalization for determining the number of factors in approximate factor models. Statistics & Probability Letters, 80(23-24):1806–1813, 2010. doi: 10.1016/j.spl.2010.08.005. [3] David F. Andrews and Colin L. Mallows. Scale mixtures of normal distributions. Journal of the Royal Statistical Society: Series B (Methodological), 36(1):99–102, 1974. doi: 10.1111/j.2517-6161.1974.tb00989.x. [4] Jushan Bai and Kunpeng Li. Statistical analysis of factor models of high dimension. The Annals of Statistics, 40(1):436–465, 2012. doi: 10.1214/11-AOS966. [5] Jushan Bai and Serena Ng. Determining the number of factors in approximate factor models. Econometrica, 70(1):191–221, 2002. doi: 10.1111/1468-0262.00273. [6] Zhidong Bai and Jack W. Silverstein. CLT for linear spectral statistics of large-dimensional sample covariance matrices. The Annals of Probability, 32(1A), 2004. doi: 10.1214/aop/1078415845. [7] Zhidong Bai and Jack W. Silverstein. Spectral analysis of large dimensional random matrices. Springer, 2nd edition, 2010. [8] Zhidong Bai and Jianfeng Yao. On the convergence of the spectral empirical process of wigner matrices. Bernoulli, 11(6):1059–1092, 2005. doi: 10.3150/bj/1137421640. [9] Zhidong Bai and Jianfeng Yao. Central limit theorems for eigenvalues in a spiked population model. Annales de l’IHP Probabilités et statistiques, 44(3):447–474, 2008. [10] Zhidong Bai and Jianfeng Yao. On sample eigenvalues in a generalized spiked population model. Journal of Multivariate Analysis, 106:167–177, 2012. doi: 10.1016/j.jmva.2011.10.009. [11] Zhidong Bai and Yanqing Yin. Convergence to the semicircle law. The Annals of Probability, 16(2):863–875, 1988. doi: 10.1214/aop/1176991792. [12] Zhidong Bai and Wang Zhou. Large sample covariance matrices without independence structures in columns. Statistica Sinica, 18(2):425–442, 2008. [13] Zhidong Bai, Dandan Jiang, Jianfeng Yao, and Shurong Zheng. Corrections to LRT on large-dimensional covariance matrix by RMT. The Annals of Statistics, 37(6B):3822 – 3840, 2009. doi: 10.1214/09-aos694. [14] Afonso S. Bandeira, Asad Lodhia, and Philippe Rigollet. Marčenko-Pastur law for Kendall’s tau. Electronic Communications in Probability, 22, 2017. doi: 10.1214/17-ecp59. [15] Zhigang Bao. On asymptotic expansion and central limit theorem of linear eigenvalue statistics for sample covariance matrices when N/M -> 0. Theory of Probability & Its Applications, 59(2):185–207, 2015. doi: 10.1137/s0040585x97t987089. [16] Zhigang Bao. Tracy-Widom limit for Spearman’s rho. Technical report, The Hong Kong University of Science and Technology, 2019. [17] Zhigang Bao, Liang-Ching Lin, Guangming Pan, and Wang Zhou. Spectral statistics of large dimensional Spearman's rank correlation matrix and its application. The Annals of Statistics, 43(6):2588–2623, 2015. doi:10.1214/15-AOS1353. [18] Patrick Billingsley. Convergence of probability measures. New York: Wiley, 1968. [19] Patrick Billingsley. Probability and measure. John Wiley & Sons, 2008. [20] Donald L. Burkholder. Distribution function inequalities for martingales. The Annals of Probability, 1(1):19–42, 1973. doi: 10.1214/aop/1176997023. [21] Clare Bycroft, Colin Freeman, Desislava Petkova, Gavin Band, Lloyd T Elliott, Kevin Sharp, Allan Motyer, Damjan Vukcevic, Olivier Delaneau, Jared O'Connell, et al. The uk biobank resource with deep phenotypingand genomic data. Nature, 562(7726):203–209, 2018. doi: 10.1038/s41586-018-0579-z. [22] Yanan Cao, Lin Li, Min Xu, Zhimin Feng, Xiaohui Sun, Jieli Lu, Yu Xu, Peina Du, Tiange Wang, Ruying Hu, et al. The chinamap analytics of deep whole genome sequences in 10,588 individuals. Cell research, 30(9):717–731, 2020. doi: 10.1038/s41422-020-0322-9. [23] Binbin Chen and Guangming Pan. Convergence of the largest eigenvalue of normalized sample covariance matrices when p and n both tend to infinity with their ratio converging to zero. Bernoulli, 18(4), 2012. doi:10.3150/11-bej381. [24] Binbin Chen and Guangming Pan. CLT for linear spectral statistics of normalized sample covariance matrices with the dimension much larger than the sample size. Bernoulli, 21(2):1089–1133, 2015. doi: 10.3150/14-bej599. [25] Rong Chen, Han Xiao, and Dan Yang. Autoregressive models for matrix-valued time series. Journal of Econometrics, 222(1):539–560, 2021. doi: 10.1016/j.jeconom.2020.07.015. [26] Weilin Chen and Clifford Lam. Rank and factor loadings estimation in time series tensor factor model by pre-averaging. The Annals of Statistics, 52(1), 2024. ISSN 0090-5364. doi: 10.1214/23-aos2350. [27] Romain Couillet and Merouane Debbah. Random matrix methods for wireless communications. Cambridge University Press, 2011. [28] Romain Couillet and Zhenyu Liao. Random Matrix Methods for Machine Learning. Cambridge University Press, 2022. doi: 10.1017/9781009128490. [29] Noureddine El Karoui. Concentration of measure and spectra of random matrices: Applications to correlation matrices, elliptical distributions and beyond. The Annals of Applied Probability, 19(6):2362 – 2405, 2009. doi: 10.1214/08-aap548. [30] Jianqing Fan, Han Liu, and Weichen Wang. Large covariance estimation through elliptical factor models. The Annals of statistics, 46(4):1383–1414, 2018. doi: 10.1214/17-AOS1588. [31] Jianqing Fan, Jianhua Guo, and Shurong Zheng. Estimating number of factors by adjusted eigenvalues thresholding. Journal of the American Statistical Association, 117(538):852–861, 2020. doi: 10.1080/01621459.2020.1825448. [32] Marc Hallin and Roman Liška. Determining the number of factors in the general dynamic factor model. Journal of the American Statistical Association, 102(478):603–617, 2007. doi: 10.1198/016214506000001275. [33] Matthew C. Harding. Estimating the number of factors in large dimensional factor models. Technical report, University of California-Irvine, 2013. [34] Yong He, Xinbing Kong, Long Yu, and Xinsheng Zhang. Large-dimensional factor analysis without moment constraints. Journal of Business & Economic Statistics, 40(1):302–312, 2022. [35] Yong He, Yalin Wang, Long Yu, Wang Zhou, and Wen-Xin Zhou. Matrix Kendall's tau in high-dimensions: A robust statistic for matrix factor model. arXiv preprint arXiv:2207.09633, 2022. [36] Andréas Heinen and Alfonso Valdesogo. Spearman rank correlation of the bivariate Student t and scale mixtures of normal distributions. Journal of Multivariate Analysis, 179, 2020. doi: 10.1016/j.jmva.2020.104650. [37] Wassily Hoeffding. A class of statistics with asymptotically normal distribution. The Annals of Mathematical Statistics, 19(3):293–325, 1948. doi: 10.1214/aoms/1177730196. [38] Roger A Horn and Charles R Johnson. Topic in matrix analysis. Cambridge University Press, 1991. [39] Roger A Horn and Charles R Johnson. Matrix analysis. Cambridge University Press, 2012. [40] Dandan Jiang and Zhidong Bai. Generalized four moment theorem and an application to CLT for spiked eigenvalues of high-dimensional covariance matrices. Bernoulli, 27(1):274–294, 2021. [41] Dandan Jiang and Zhidong Bai. Supplement to "Generalized four moment theorem and an application to CLT for spiked eigenvalues of high-dimensional covariance matrices". Bernoulli, 2021. doi: 10.3150/20-BEJ1237SUPP. [42] Alexei M. Khorunzhy, Boris A. Khoruzhenko, and Leonid A. Pastur. Asymptotic properties of large random matrices with independent entries. Journal of Mathematical Physics, 37(10):5033–5060, 1996. doi: 10.1063/1.531589. [43] Xin-Bing Kong. On the number of common factors with high-frequency data. Biometrika, 104(2):397–410, 2017. doi: 10.1093/biomet/asx014. [44] Clifford Lam. Rank determination for time series tensor factor model using correlation thresholding. Technical report, Working paper LSE, 2021. [45] Clifford Lam and Qiwei Yao. Factor modeling for high-dimensional time series: Inference for the number of factors. The Annals of Statistics, 40(2), 2012. doi: 10.1214/12-aos970. [46] Olivier Ledoit and Michael Wolf. Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. The Annals of Statistics, 30(4):1081 – 1102, 2002. doi: 10.1214/aos/1031689018. [47] Michel Ledoux. The concentration of measure phenomenon, volume 89. American Mathematical Society, 2001. [48] Hongjun Li, Qi Li, and Yutang Shi. Determining the number of factors when the number of factors can increase with sample size. Journal of Econometrics, 197(1):76–86, 2017. doi: 10.1016/j.jeconom.2016.06.003. [49] Zeng Li and Jianfeng Yao. Testing the sphericity of a covariance matrix when the dimension is much larger than the sample size. Electronic Journal of Statistics, 10(2):2973–3010, 2016. doi: 10.1214/16-ejs1199. [50] Zeng Li, Qinwen Wang, and Jianfeng Yao. Identifying the number of factors from singular values of a large sample auto-covariance matrix. The Annals of Statistics, 45(1):257 – 288, 2017. doi: 10.1214/16-aos1452. [51] Zeng Li, Cheng Wang, and Qinwen Wang. On eigenvalues of a high-dimensional Kendall's rank correlation matrix with dependence. Science China Mathematics, 66(11):2615–2640, September 2023. ISSN 1869-1862.doi: 10.1007/s11425-022-2031-2. [52] Han Liu, John Lafferty, and Larry Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. Journal of Machine Learning Research, 10(80):2295–2328, 2009. [53] Volodymyr Marčenko and Leonid Pastur. Distribution of eigenvalues for some sets of random matrices. Mathematics of the USSR-Sbornik, 1(4): 457–483, 1967. doi: 10.1070/sm1967v001n04abeh001994. [54] Michael W. McCracken and Serena Ng. FRED-MD: A monthly database for macroeconomic research. Journal of Business & Economic Statistics, 34(4):574–589, 2016. doi: 10.1080/07350015.2015.1086655. [55] Raj Rao Nadakuditi and Alan Edelman. Sample eigenvalue based detection of high-dimensional signals in white noise using relatively few samples. IEEE Transactions on Signal Processing, 56(7):2625 – 2638, 2008.ISSN 1941-0476. doi: 10.1109/tsp.2008.917356. [56] Hisao Nagao. On some test criteria for covariance matrix. The Annals of Statistics, 1(4):700 – 709, 1973. doi: 10.1214/aos/1176342464. [57] Alexei Onatski. Determining the number of factors from empirical distribution of eigenvalues. Review of Economics and Statistics, 92(4):1004–1016, 2010. doi: 10.1162/rest_a_00043. [58] Alexei Onatski. Asymptotics of the principal components estimator of large factor models with weakly influential factors. Journal of Econometrics, 168(2):244–258, 2012. ISSN 0304-4076. doi: 10.1016/j.jeconom.2012.01.034. [59] Art B. Owen and Patrick O. Perry. Bi-cross-validation of the SVD and the nonnegative matrix factorization. The Annals of Applied Statistics, 3(2), 2009. ISSN 1932-6157. doi: 10.1214/08-aoas227. [60] Art B. Owen and Jingshu Wang. Bi-cross-validation for factor analysis. Statistical Science, 31(1), 2016. ISSN 0883-4237. doi: 10.1214/15-sts539. [61] Guangming Pan and Wang Zhou. Central limit theorem for signal-to-interference ratio of reduced rank linear receiver. The Annals of Applied Probability, 18(3), 2008. ISSN 1050-5164. doi: 10.1214/07-aap477. [62] Guangming Pan and Wang Zhou. Central limit theorem for Hotelling's T2 statistic under large dimension. The Annals of Applied Probability, 21(5):1860–1910, 2011. doi: 10.1214/10-aap742. [63] Sandrine Péché. Universality results for the largest eigenvalues of some sample covariance matrix ensembles. Probability Theory and Related Fields, 143(3–4):481–516, 2009. ISSN 1432-2064. doi: 10.1007/s00440-007-0133-7. [64] Jiaxin Qiu, Zeng Li, and Jianfeng Yao. Asymptotic normality for eigenvalue statistics of a general sample covariance matrix when p/n -> infty and applications. The Annals of Statistics, 51(3), 2023. ISSN 0090-5364. doi:10.1214/23-aos2300. [65] Jiaxin Qiu, Zeng Li, and Jianfeng Yao. Robust estimation for number of factors in high dimensional factor modeling via spearman correlation matrix. arXiv preprint arXiv:2309.00870, 2023. [66] Charles Spearman. The proof and measurement of association between two things. The American Journal of Psychology, 15:72–101, 1961. [67] The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature, 526(7571):68–74, September 2015. ISSN 1476-4687. doi: 10.1038/nature15393. [68] Roman Vershynin. High-dimensional probability: An introduction with applications in data science, volume 47. Cambridge University Press, 2018. [69] Lili Wang and Debashis Paul. Limiting spectral distribution of renormalized separable sample covariance matrices when p/n -> 0. Journal of Multivariate Analysis, 126:25–52, 2014. doi: 10.1016/j.jmva.2013.12.015. [70] Qinwen Wang and Jianfeng Yao. On the sphericity test with large-dimensional observations. Electronic Journal of Statistics, 7(none):2164–2192, 2013. doi: 10.1214/13-ejs842. [71] Degang Wu, Jinzhuang Dou, Xiaoran Chai, Claire Bellis, Andreas Wilm, Chih Chuan Shih, Wendy Wei Jia Soon, Nicolas Bertin, Clarabelle Bitong Lin, Chiea Chuen Khor, et al. Large-scale whole-genome sequencing of three diverse Asian populations in Singapore. Cell, 179(3):736–749, 2019. doi: 10.1016/j.cell.2019.09.019. [72] Zeyu Wu and Cheng Wang. Limiting spectral distribution of large dimensional Spearman's rank correlation matrices. Journal of Multivariate Analysis, 191, 2022. doi: 10.1016/j.jmva.2022.105011. [73] Jianfeng Yao, Shurong Zheng, and Zhidong Bai. Sample covariance matrices and high-dimensional data analysis. Cambridge University Press, 2015. [74] Long Yu, Yong He, and Xinsheng Zhang. Robust factor number specification for large-dimensional elliptical factor model. Journal of Multivariate analysis, 174, 2019. [75] Shurong Zheng, Zhidong Bai, and Jianfeng Yao. Substitution principle for clt of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. The Annals of Statistics, 43(2), 2015. ISSN 0090-5364. doi: 10.1214/14-aos1292. |
来源库 | 人工提交
|
成果类型 | 学位论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/828875 |
专题 | 理学院_统计与数据科学系 |
推荐引用方式 GB/T 7714 |
Qiu JX. On Eigenvalue Statistics of Large Sample Covariance Matrices and Spearman Correlation Matrices[D]. 中国香港. 香港大学,2024.
|
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | 操作 | |
12050012-邱佳鑫-统计与数据科学(2410KB) | -- | -- | 限制开放 | -- | 请求全文 |
个性服务 |
原文链接 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
导出为Excel格式 |
导出为Csv格式 |
Altmetrics Score |
谷歌学术 |
谷歌学术中相似的文章 |
[邱佳鑫]的文章 |
百度学术 |
百度学术中相似的文章 |
[邱佳鑫]的文章 |
必应学术 |
必应学术中相似的文章 |
[邱佳鑫]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论