中文版 | English
题名

DIMENSION REDUCTION METHODS FOR COMPLEX DATA ANALYSIS

其他题名
复杂数据分析中的降维方法
姓名
姓名拼音
HE Shuaida
学号
12031307
学位类型
博士
学位专业
0701 数学
学科门类/专业学位类别
07 理学
导师
CHEN XIN
导师单位
统计与数据科学系
论文答辩日期
2024-05-19
论文提交日期
2024-06-18
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

Dimension reduction plays a crucial role in handling high-dimensional data. Recent technological advancements have substantially increased both the dimensionality and complexity of data collected across various scientific fields. These pose new challenges for traditional dimension reduction methods, especially considering the additional constraints imposed on data usage by modern applications. This dissertation addresses these challenges by developing novel dimension reduction techniques tailored for complex data analysis scenarios, explicitly focusing separately on private, non-Euclidean, and heavy-tailed data.

In the first study, we propose a sufficient dimension reduction (SDR) method to address the challenges of decentralized data, prioritizing privacy and communication efficiency. Our approach, named federated sliced inverse regression (FSIR), facilitates distributed computing of the SDR subspace among multiple clients, solely sharing local estimates to protect sensitive datasets from exposure. To guard against potential adversary attacks, FSIR further employs diverse perturbation strategies, including a novel vectorized Gaussian mechanism that guarantees differential privacy at a low cost of statistical accuracy. Additionally, FSIR naturally incorporates a collaborative feature screening step, enabling effective handling of high-dimensional client data. Theoretical properties of FSIR are established for both low-dimensional and high-dimensional settings, supported by extensive numerical experiments and real data analysis. Furthermore, considering that the SDR subspace lies on the Grassmann manifold, limited research has explored the influence of privacy attacks on its estimates. Under the principal fitted component (PFC) model, we introduce a specific tracing attack to emphasize the necessity of privacy protection on the SDR estimates. We adapt the classical gradient algorithm for Grassmann manifold optimization as a defense strategy to ensure differential privacy.

In the second study, we focus on feature selection for non-Euclidean data, which lack a vector structure and pose significant challenges for statistical analysis. Within the background of binary classification, we introduce a data-adaptive filtering procedure to identify informative features from a large-scale of random objects, leveraging a novel Kolmogorov–Smirnov-type statistic defined on the metric space. Our method can be applied to data in general metric spaces with binary labels and enjoys a model-free property. Theoretically, it guarantees the sure screening while controlling the false discovery rate. When applied to analyze a dataset on autism, our method identifies significant brain regions associated with the condition. Moreover, it reveals distinct interaction patterns among these regions between individuals with and without autism, achieved by filtering hundreds of thousands of covariance matrices representing various brain connectivities.

In the third study, we address direction estimation in single-index models, with a focus on heavy-tailed data applications. Our method utilizes cumulative divergence to directly capture the conditional mean dependence between the response variable and the index feature, resulting in a model-free property that obviates the need for initial link function estimation. Furthermore, it allows heavy-tailed predictors and is robust against outliers, leveraging the rank-based nature of cumulative divergence. We establish theoretical properties for our proposal under mild regularity conditions and illustrate its solid performance through comprehensive simulations and real data analysis.

其他摘要

降维(dimension reduction)是分析高维数据的重要工具。伴随科技的进步,现代数据的维数日益增加,结构也愈发复杂;同时,新兴的应用场景也对数据的使用和分析施加了种种约束。这对传统降维技术的有效性与可靠性提出挑战。针对高维复杂数据分析的需求,本文提出几种新型降维方法,重点应用在隐私数据、非欧(non-Euclidean)数据和重尾数据中。

论文的第一部分针对分布式存储的隐私数据,设计了一类新的充分降维方法,称为联邦切片逆回归(Federated sliced inverse regression, FSIR)。FSIR 能基于多个数据节点完成对降维子空间的联合估计。计算过程中,各节点仅需共享本地估计量,避免了敏感数据的传输;同时,FSIR 通过结合多种噪声扰动技术,包括一种新型的向量化高斯噪声机制,充分保障了估计量的差分隐私。FSIR 可借助各节点的协作工作完成对超高维特征的快速筛选。我们从理论证明、模拟验证、实际数据分析等方面完整论述了 FSIR 的有效性。
此外,降维子空间可视为 Grassmann 流形上的元素,对于这类特殊的参数,很少有研究论述隐私攻击对其估计量的影响。我们在前向主成分(Principal fitted component, PFC)模型的框架下,提出一种追踪攻击(tracing attack),演示了对降维子空间施加隐私保护的必要性,并给出了隐私保护约束下 PFC 模型的拟合算法。

论文第二部分考虑了大规模非欧特征的筛选问题。非欧数据内生的复杂性以及向量结构的缺失,导致对海量非欧特征的建模非常困难。鉴于此,我们在二分类背景下,设计了一种适用于一般度量空间中海量特征的非参数筛选方法,用以识别与分类相关的关键特征。该方法借助定义在度量空间上的 Kolmogorov-Smirnov 型统计量,避免了对特定分类模型的依赖,且理论上满足确定筛选(sure screening)性质。通过结合数据驱动的阈值选择算法,该方法能够有效控制筛选过程的错误发现率。在自闭症数据的实例分析中,该方法通过筛选表征不同脑连接结构的数十万个协方差矩阵,成功识别出与自闭症相关的关键脑区域,并揭示了患者组与对照组在这些脑区交互模式上的显著差异性。

论文第三部分研究了重尾数据中单指标模型的方向估计问题。单指标模型作为线性模型的灵活推广,被广泛应用在不同领域。单指标模型也常被视作充分降维方法的特例,其中单指标给出了降维的方向。考虑到实际数据中常见的重尾现象以及异常值对模型估计的影响,我们给出一个稳健的方向估计策略,即通过最大化响应变量与单指标特征间的累积散度(cumulative divergence)来求得单指标估计。这一方法直接刻画了单指标模型中的条件均值依赖性,避免了对连接函数(link function)估计的依赖,具有无模型假设(model-free)的优点。得益于累积散度中的秩结构,该方法可应用在重尾分布的特征上,并能稳健处理数据中的异常值。我们在一定的正则条件下建立了方法的一致性,并通过模拟和实际数据分析研究验证了其效果。

关键词
语种
英语
培养类别
独立培养
入学年份
2020
学位授予年份
2024-06
参考文献列表

[1]LI K C. Sliced inverse regression for dimension reduction[J]. Journal of the American StatisticalAssociation, 1991, 86(414): 316-327.
[2]COOK R D. On the interpretation of regression plots[J]. Journal of the American StatisticalAssociation, 1994, 89(425): 177-189.
[3]COOK R D. Graphics for regressions with a binary response[J]. Journal of the AmericanStatistical Association, 1996, 91(435): 983-992.
[4]COOK R D. Regression graphics: Ideas for studying regressions through graphics[M]. JohnWiley & Sons, 2009.
[5]COOK R D, WEISBERG S. Discussion of sliced inverse regression for dimension reduction[J]. Journal of the American Statistical Association, 1991, 86(414): 328-332.
[6]LI B, WANG S. On directional regression for dimension reduction[J]. Journal of the AmericanStatistical Association, 2007, 102(479): 997-1008.
[7]ZHU L P, ZHU L X, FENG Z H. Dimension reduction in regressions through cumulative slicingestimation[J]. Journal of the American Statistical Association, 2010, 105(492): 1455-1466.
[8]XIA Y, TONG H, LI W K, et al. An adaptive estimation of dimension reduction space[M]//Exploration of A Nonlinear World: An Appreciation of Howell Tong’s Contributions to Statis-tics. World Scientific, 2009: 299-346.
[9]CHEN X, ZOU C, COOK R D. Coordinate-independent sparse sufficient dimension reductionand variable selection[J]. The Annals of Statistics, 2010, 38(6): 3696-3723.
[10]LIANG S, SUN Y, LIANG F. Nonlinear Sufficient Dimension Reduction with a StochasticNeural Network[C]//Advances in Neural Information Processing Systems.
[11]YING C, YU Z. Fréchet sufficient dimension reduction for random objects[J]. Biometrika,2022, 109(4): 975-992.
[12]ZENG J, MAI Q, ZHANG X. Subspace estimation with automatic dimension and variableselection in sufficient dimension reduction[J]. Journal of the American Statistical Association,2022: 1-13.
[13]LI B. Sufficient dimension reduction: Methods and applications with R[M]. Chapman andHall/CRC, 2018.
[14]CHAUDHURI K, SARWATE A D, SINHA K. A Near-Optimal Algorithm for Differentially-Private Principal Components.[J]. Journal of Machine Learning Research, 2013, 14.
[15]JIANG W, XIE C, ZHANG Z. Wishart mechanism for differentially private principal com-ponents analysis[C]//Proceedings of the AAAI Conference on Artificial Intelligence: Vol. 30.2016.
[16]GRAMMENOS A, MENDOZA SMITH R, CROWCROFT J, et al. Federated principal com-ponent analysis[J]. Advances in Neural Information Processing Systems, 2020, 33: 6453-6464.
[17]FAN J, LV J. Sure independence screening for ultrahigh dimensional feature space[J]. Journalof the Royal Statistical Society Series B: Statistical Methodology, 2008, 70(5): 849-911.
[18]FAN J, LV J. Sure independence screening[J]. Wiley StatsRef: Statistics Reference Online,2018.
[19]ZHU L P, LI L, LI R, et al. Model-free feature screening for ultrahigh-dimensional data[J].Journal of the American Statistical Association, 2011, 106(496): 1464-1475.
[20]MAI Q, ZOU H. The Kolmogorov filter for variable screening in high-dimensional binaryclassification[J]. Biometrika, 2013, 100(1): 229-234.
[21]CUI H, LI R, ZHONG W. Model-free feature screening for ultrahigh dimensional discriminantanalysis[J]. Journal of the American Statistical Association, 2015, 110(510): 630-641.
[22]HRISTACHE M, JUDITSKY A, SPOKOINY V. Direct estimation of the index coefficient ina single-index model[J]. Annals of Statistics, 2001: 595-623.
[23]ZHOU T, ZHU L, XU C, et al. Model-free forward screening via cumulative divergence[J].Journal of the American Statistical Association, 2020, 115(531): 1393-1405.
[24]TONG J, LUO C, ISLAM M N, et al. Distributed learning for heterogeneous clinical data withapplication to integrating COVID-19 data across 230 sites[J]. NPJ digital medicine, 2022, 5(1):76.
[25]LIU X, DUAN R, LUO C, et al. Multisite learning of high-dimensional heterogeneous data withapplications to opioid use disorder study of 15,000 patients across 5 clinical sites[J]. Scientificreports, 2022, 12(1): 11073.
[26]MCMAHAN B, MOORE E, RAMAGE D, et al. Communication-efficient learning of deepnetworks from decentralized data[C]//Artificial intelligence and statistics. PMLR, 2017: 1273-1282.
[27]LI T, SAHU A K, TALWALKAR A, et al. Federated learning: Challenges, methods, and futuredirections[J]. IEEE signal processing magazine, 2020, 37(3): 50-60.
[28]JORDAN M I, LEE J D, YANG Y. Communication-efficient distributed statistical inference[J].Journal of the American Statistical Association, 2019, 114(526): 668-681.
[29]FAN J, GUO Y, WANG K. Communication-efficient accurate statistical estimation[J]. Journalof the American Statistical Association, 2023, 118(542): 1000-1010.
[30]DUAN R, NING Y, CHEN Y. Heterogeneity-aware and communication-efficient distributedstatistical inference[J]. Biometrika, 2022, 109(1): 67-83.
[31]HOMER N, SZELINGER S, REDMAN M, et al. Resolving individuals contributing traceamounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays[J]. PLoS genetics, 2008, 4(8): e1000167.
[32]CALANDRINO J A, KILZER A, NARAYANAN A, et al. “You might also like:” Privacyrisks of collaborative filtering[C]//2011 IEEE symposium on security and privacy. IEEE, 2011:231-246.
[33]BUN M, ULLMAN J, VADHAN S. Fingerprinting codes and the price of approximate differ-ential privacy[C]//Proceedings of the forty-sixth annual ACM symposium on Theory of com-puting. 2014: 1-10.
[34]DWORK C, SMITH A, STEINKE T, et al. Robust traceability from trace amounts[C]//2015IEEE 56th Annual Symposium on Foundations of Computer Science. IEEE, 2015: 650-669.
[35]KAMATH G, LI J, SINGHAL V, et al. Privately learning high-dimensional distributions[C]//Conference on Learning Theory. PMLR, 2019: 1853-1902.
[36]DWORK C, SMITH A, STEINKE T, et al. Exposed! a survey of attacks on private data[J].Annu. Rev. Stat. Appl, 2017, 4(1): 61-84.
[37]DWORK C, MCSHERRY F, NISSIM K, et al. Calibrating noise to sensitivity in private dataanalysis[C]//Theory of cryptography conference. Springer, 2006: 265-284.
[38]DWORK C, LEI J. Differential privacy and robust statistics[C]//Proceedings of the forty-firstannual ACM symposium on Theory of computing. 2009: 371-380.
[39]WASSERMAN L, ZHOU S. A statistical framework for differential privacy[J]. Journal of theAmerican Statistical Association, 2010, 105(489): 375-389.
[40]AVELLA-MEDINA M. Privacy-preserving parametric inference: A case for robust statistics[J]. Journal of the American Statistical Association, 2021, 116(534): 969-983.
[41]ERLINGSSON Ú, PIHUR V, KOROLOVA A. Rappor: Randomized aggregatable privacy-preserving ordinal response[C]//Proceedings of the 2014 ACM SIGSAC conference on com-puter and communications security. 2014: 1054-1067.
[42]APPLE D. Learning with privacy at scale[J]. Apple Machine Learning Journal, 2017, 1(8).
[43]DING B, KULKARNI J, YEKHANIN S. Collecting telemetry data privately[J]. Advances inNeural Information Processing Systems, 2017, 30.
[44]DRECHSLER J. Differential Privacy for Government Agencies—Are We There Yet?[J]. Jour-nal of the American Statistical Association, 2023, 118(541): 761-773.
[45]DWORK C, ROTH A, et al. The algorithmic foundations of differential privacy[J]. Foundationsand Trends® in Theoretical Computer Science, 2014, 9(3–4): 211-407.
[46]BISWAS S, DONG Y, KAMATH G, et al. Coinpress: Practical private mean and covarianceestimation[J]. Advances in Neural Information Processing Systems, 2020, 33: 14475-14485.
[47]TALWAR K, GUHA THAKURTA A, ZHANG L. Nearly optimal private lasso[J]. Advancesin Neural Information Processing Systems, 2015, 28.
[48]CAI T T, WANG Y, ZHANG L. The cost of privacy: Optimal rates of convergence for parameterestimation with differential privacy[J]. The Annals of Statistics, 2021, 49(5): 2825-2850.
[49]PENROSE K W, NELSON A, FISHER A. Generalized body composition prediction equationfor men using simple measurement techniques[J]. Medicine & Science in Sports & Exercise,1985, 17(2): 189.
[50]CHEN C H, LI K C. Generalization of Fisher’s linear discriminant analysis via the approach ofsliced inverse regression[J]. Journal of the Korean Statistical Society, 2001, 30(2): 193-217.
[51]XU K, ZHU L, FAN J. Distributed sufficient dimension reduction for heterogeneous massivedata[J]. Statistica Sinica, 2022, 32: 2455-2476.
[52]CHEN C, XU W, ZHU L. Distributed estimation in heterogeneous reduced rank regression:With application to order determination in sufficient dimension reduction[J]. Journal of Multi-variate Analysis, 2022, 190: 104991.
[53]CUI W, ZHAO Y, XU J, et al. Federated Sufficient Dimension Reduction Through High-Dimensional Sparse Sliced Inverse Regression[A]. 2023: 1-38.
[54]DUCHI J C, FELDMAN V, HU L, et al. Subspace Recovery from Heterogeneous Data withNon-isotropic Noise[J]. Advances in Neural Information Processing Systems, 2022, 35: 5854-5866.
[55]LI L. Sparse sufficient dimension reduction[J]. Biometrika, 2007, 94(3): 603-613.
[56]LIN Q, ZHAO Z, LIU J S. Sparse sliced inverse regression via lasso[J]. Journal of the AmericanStatistical Association, 2019, 114(528): 1726-1739.
[57]TAN K M, WANG Z, ZHANG T, et al. A convex formulation for high-dimensional sparsesliced inverse regression[J]. Biometrika, 2018, 105(4): 769-782.
[58]LIN Q, LI X, HUANG D, et al. On the optimality of sliced inverse regression in high dimensions[J]. The Annals of Statistics, 2021, 49(1): 1 - 20.
[59]TAN K, SHI L, YU Z. Sparse SIR: Optimal rates and adaptive estimation[J]. The Annals ofStatistics, 2020, 48(1): 64 - 85.
[60]ABADI M, CHU A, GOODFELLOW I, et al. Deep learning with differential privacy[C]//Proceedings of the 2016 ACM SIGSAC conference on computer and communications security.2016: 308-318.
[61]BURA E, COOK R D. Extending sliced inverse regression: The weighted chi-squared test[J].Journal of the American Statistical Association, 2001, 96(455): 996-1003.
[62]LIN Q, ZHAO Z, LIU J S. On consistency and sparsity for sliced inverse regression in highdimensions[J]. The Annals of Statistics, 2018, 46(2): 580-610.
[63]CAI Z, LI R, ZHU L. Online sufficient dimension reduction through sliced inverse regression[J]. The Journal of Machine Learning Research, 2020, 21(1): 321-345.
[64]WANG D, XU J. Differentially private high dimensional sparse covariance matrix estimation[J]. Theoretical Computer Science, 2021, 865: 119-130.
[65]LEI J. Differentially private m-estimators[J]. Advances in Neural Information Processing Sys-tems, 2011, 24.
[66]BASSILY R, SMITH A, THAKURTA A. Private empirical risk minimization: Efficient algo-rithms and tight error bounds[C]//2014 IEEE 55th annual symposium on foundations of com-puter science. IEEE, 2014: 464-473.
[67]LUO W, LI B. On order determination by predictor augmentation[J]. Biometrika, 2021, 108(3): 557-574.
[68]DWORK C, SU W J, ZHANG L. Differentially private false discovery rate control[A]. 2018.
[69]COOK R D, FORZANI L. Principal fitted components for dimension reduction in regression[J]. Statistical Science, 2008, 23(4): 485-501.
[70]ANGUITA D, GHIO A, ONETO L, et al. A public domain dataset for human activity recogni-tion using smartphones.[C]//Esann: Vol. 3. 2013: 3.
[71]VERSHYNIN R.Introduction to the Non-asymptotic analysis of random matrices[C]//Compress Sensing. Cambridge Univ Press, 2012: 210-268.
[72]VERSHYNIN R. High-Dimensional Probability: An Introduction with Applications in DataScience[C]//Cambridge Univ Press, 2018.
[73]WEYL H. Das asymptotische Verteilungsgesetz der Eigenwerte linearer partieller Differential-gleichungen (mit einer Anwendung auf die Theorie der Hohlraumstrahlung)[J]. MathematischeAnnalen, 1912, 71(4): 441-479.
[74]COOK R D. Fisher Lecture: Dimension Reduction in Regression[J]. Statistical Science, 2007,22(1): 1 - 26.
[75]COOK R D, FORZANI L M, TOMASSI D R. LDR: A package for likelihood-based sufficientdimension reduction[J]. Journal of statistical software, 2011, 39: 1-20.
[76]TIPPING M E, BISHOP C M. Probabilistic principal component analysis[J]. Journal of theRoyal Statistical Society: Series B (Statistical Methodology), 1999, 61(3): 611-622.
[77]ZHU T, LI G, ZHOU W, et al. Differentially private data publishing and analysis: A survey[J].IEEE Transactions on Knowledge and Data Engineering, 2017, 29(8): 1619-1638.
[78]STEINKE T, ULLMAN J. Tight lower bounds for differentially private selection[C]//2017IEEE 58th Annual Symposium on Foundations of Computer Science. IEEE, 2017: 552-563.
[79]BLUM A, DWORK C, MCSHERRY F, et al. Practical privacy: the SuLQ framework[C]//Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principlesof database systems. 2005: 128-138.
[80]IMTIAZ H, SARWATE A D. Differentially private distributed principal component analysis[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE,2018: 2206-2210.
[81]DWORK C, TALWAR K, THAKURTA A, et al. Analyze gauss: Optimal bounds for privacy-preserving principal component analysis[C]//Proceedings of the forty-sixth annual ACM sym-posium on Theory of computing. 2014: 11-20.
[82]AMIN K, DICK T, KULESZA A, et al. Differentially private covariance estimation[J]. Ad-vances in Neural Information Processing Systems, 2019, 32.
[83]BROWN G, GABOARDI M, SMITH A, et al. Covariance-aware private mean estimationwithout private covariance estimation[J]. Advances in neural information processing systems,2021, 34: 7950-7964.
[84]KAPRALOV M, TALWAR K.On differentially private low rank approximation[C]//Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms.SIAM, 2013: 1395-1414.
[85]SINGHAL V, STEINKE T. Privately learning subspaces[J]. Advances in Neural InformationProcessing Systems, 2021, 34: 1312-1324.
[86]THAKURTA A G, SMITH A. Differentially private feature selection via stability arguments,and the robustness of the lasso[C]//Conference on Learning Theory. PMLR, 2013: 819-850.
[87]YANG J, LI Y. Differentially private feature selection[C]//2014 International Joint Conferenceon Neural Networks. IEEE, 2014: 4182-4189.
[88]ZHANG T, ZHU T, XIONG P, et al. Correlated differential privacy: Feature selection in ma-chine learning[J]. IEEE Transactions on Industrial Informatics, 2019, 16(3): 2115-2124.
[89]CHOROMANSKA A, CHOROMANSKI K, JAGANNATHAN G, et al. Differentially-privatelearning of low dimensional manifolds[J]. Theoretical Computer Science, 2016, 620: 91-104.
[90]HAN A, MISHRA B, JAWANPURIA P, et al. Differentially private Riemannian optimization[J]. Machine Learning, 2024, 113(3): 1133-1161.
[91]REIMHERR M, BHARATH K, SOTO C. Differential privacy over Riemannian manifolds[J].Advances in Neural Information Processing Systems, 2021, 34: 12292-12303.
[92]SOTO C, BHARATH K, REIMHERR M, et al. Shape and structure preserving differentialprivacy[J]. Advances in Neural Information Processing Systems, 2022, 35: 24693-24705.
[93]UTPALA S, VEPAKOMMA P, MIOLANE N. Differentially Private Fréchet Mean on the Man-ifold of Symmetric Positive Definite (SPD) Matrices[A]. 2022.
[94]TRUEX S, LIU L, GURSOY M E, et al. Demystifying membership inference attacks in machinelearning as a service[J]. IEEE Transactions on Services Computing, 2019, 14(6): 2073-2089.
[95]SHOKRI R, STRONATI M, SONG C, et al. Membership inference attacks against machinelearning models[C]//2017 IEEE symposium on security and privacy (SP). IEEE, 2017: 3-18.
[96]SALEM A, ZHANG Y, HUMBERT M, et al. Ml-leaks: Model and data independent member-ship inference attacks and defenses on machine learning models[A]. 2018.
[97]HU H, SALCIC Z, SUN L, et al. Membership inference attacks on machine learning: A survey[J]. ACM Computing Surveys (CSUR), 2022, 54(11s): 1-37.
[98]CAI T T, WANG Y, ZHANG L. Score Attack: A Lower Bound Technique for Optimal Differ-entially Private Learning[A]. 2023.
[99]EDELMAN A, ARIAS T A, SMITH S T. The geometry of algorithms with orthogonalityconstraints[J]. SIAM journal on Matrix Analysis and Applications, 1998, 20(2): 303-353.
[100] GALLIVAN K A, SRIVASTAVA A, LIU X, et al. Efficient algorithms for inferences on Grass-mann manifolds[C]//IEEE Workshop on Statistical Signal Processing, 2003. IEEE, 2003: 315-318.
[101] COOK R D, LILIANA F. Principal fitted components for dimension reduction in regression[J].Statistical Science, 2008, 23(1): 485-501.
[102] COOK R D. An introduction to envelopes: Dimension reduction for efficient estimation inmultivariate statistics[M]. John Wiley & Sons, 2018.
[103] EDELMAN A, ARIAS T, COOK R D. The geometry of algorithms with orthogonality con-straints[J]. SIAM Hornal on matrix analysis and applications, 1998, 20(2): 303-353.
[104] COOK R D, FORZANI L. Likelihood-based sufficient dimension reduction[J]. Journal of theAmerican Statistical Association, 2009, 104(485): 197-208.
[105] LIU X, SRIVASTAVA A, GALLIVAN K. Optimal linear representations of images for ob-ject recognition[C]//2003 IEEE Computer Society Conference on Computer Vision and PatternRecognition, 2003. Proceedings.: Vol. 1. IEEE, 2003: I-I.
[106] LIN Z. Riemannian geometry of symmetric positive definite matrices via Cholesky decompo-sition[J]. SIAM Journal on Matrix Analysis and Applications, 2019, 40(4): 1353-1370.
[107] MARDIA K V, JUPP P E, MARDIA K. Directional statistics: Vol. 2[M]. Wiley Online Library,2000.
[108] BILLERA L J, HOLMES S P, VOGTMANN K. Geometry of the space of phylogenetic trees[J]. Advances in Applied Mathematics, 2001, 27(4): 733-767.
[109] LIM L H, WONG K S W, YE K. The Grassmannian of affine subspaces[J]. Foundations ofComputational Mathematics, 2021, 21: 537-574.
[110] CHEN Y, LIN Z, MÜLLER H G. Wasserstein regression[J]. Journal of the American StatisticalAssociation, 2023, 118(542): 869-882.
[111] VON LUXBURG U, BOUSQUET O. Distance-Based Classification with Lipschitz Functions.[J]. J. Mach. Learn. Res., 2004, 5(Jun): 669-695.
[112] GOTTLIEB L A, KONTOROVICH A, KRAUTHGAMER R. Efficient classification for metricdata[J]. IEEE Transactions on Information Theory, 2014, 60(9): 5750-5759.
[113] COVER T, HART P. Nearest neighbor pattern classification[J]. IEEE transactions on informa-tion theory, 1967, 13(1): 21-27.
[114] CHAUDHURI K, DASGUPTA S. Rates of Convergence for Nearest Neighbor Classification[M]//GHAHRAMANI Z, WELLING M, CORTES C, et al. Advances in Neural InformationProcessing Systems: Vol. 27. Curran Associates, Inc., 2014.
[115] KONTOROVICH A, SABATO S, URNER R. Active nearest-neighbor learning in metric spaces[J]. Journal of Machine Learning Research, 2018, 18(195): 1-38.
[116] LIN Y, LIN Z. Logistic Regression and Classification with non-Euclidean Covariates[A]. 2023.
[117] PAN W, WANG X, XIAO W, et al. A generic sure independence screening procedure[J]. Journalof the American Statistical Association, 2019, 114(526): 928-937.
[118] KLEIBER M, PERVIN W J. A generalized Banach-Mazur theorem[J]. Bulletin of The Aus-tralian Mathematical Society, 1969, 1(2): 169-173.
[119] MCCORMACK A, HOFF P. Equivariant estimation of Fréchet means[J]. Biometrika, 2023,110(4): 1055-1076.
[120] PETERSEN A, MÜLLER H G. Fréchet regression for random objects with Euclidean predictors[J]. The Annals of Statistics, 2019, 47(2): 691-719.
[121] BHATTACHARJEE S, MÜLLER H G. Single index Fréchet regression[J]. The Annals ofStatistics, 2023, 51(4): 1770-1798.
[122] ZHANG Q, XUE L, LI B. Dimension reduction for Fréchet regression[J]. Journal of the Amer-ican Statistical Association, 2023: 1-15.
[123] DUBEY P, MÜLLER H G. Functional models for time-varying random objects[J]. Journal ofthe Royal Statistical Society Series B: Statistical Methodology, 2020, 82(2): 275-327.
[124] FEDERER H. Geometric measure theory[M]. Springer, 2014.
[125] WANG X, ZHU J, PAN W, et al. Nonparametric statistical inference via metric distributionfunction in metric spaces[J]. Journal of the American Statistical Association, 2023: 1-13.
[126] ARSIGNY V, FILLARD P, PENNEC X, et al. Geometric means in a novel vector space structureon symmetric positive-definite matrices[J]. SIAM journal on matrix analysis and applications,2007, 29(1): 328-347.
[127] DRYDEN I L, KOLOYDENKO A, ZHOU D. Non-Euclidean statistics for covariance matrices,with applications to diffusion tensor imaging[J]. The Annals of Applied Statistics, 2009, 3(3):1102 - 1123.
[128] WANG X, ZHU J, PAN W, et al. Nonparametric Statistical Inference via Metric DistributionFunction in Metric Spaces[A]. 2021: arXiv-2107.
[129] DVORETZKY A, KIEFER J, WOLFOWITZ J. Asymptotic minimax character of the sampledistribution function and of the classical multinomial estimator[J]. The Annals of MathematicalStatistics, 1956: 642-669.
[130] BENJAMINI Y, HOCHBERG Y. Controlling the false discovery rate: A practical and powerfulapproach to multiple testing[J]. Journal of the Royal statistical society: series B (Methodolog-ical), 1995, 57(1): 289-300.
[131] GUO X, REN H, ZOU C, et al. Threshold selection in feature screening for error rate control[J]. Journal of the American Statistical Association, 2023, 118(543): 1773-1785.
[132] FOURNIER N, GUILLIN A. On the rate of convergence in Wasserstein distance of the empir-ical measure[J]. Probability Theory and Related Fields, 2015, 162(3): 707-738.
[133] LEI J. Convergence and concentration of empirical measures under Wasserstein distance inunbounded functional spaces[J]. Bernoulli, 2020, 26(1): 767-798.
[134] FAN J, SONG R. SURE INDEPENDENCE SCREENING IN GENERALIZED LINEARMODELS WITH NP-DIMENSIONALITY[J]. The Annals of Statistics, 2010, 38(6): 3567-3604.
[135] MUHLE R, TRENTACOSTE S V, RAPIN I. The genetics of autism[J]. Pediatrics, 2004, 113(5): e472-e486.
[136] HA S, SOHN I J, KIM N, et al. Characteristics of brains in autism spectrum disorder: Structure,function and connectivity across the lifespan[J]. Experimental neurobiology, 2015, 24(4): 273.
[137] POSTEMA M C, VAN ROOIJ D, ANAGNOSTOU E, et al. Altered structural brain asymmetryin autism spectrum disorder in a study of 54 datasets[J]. Nature communications, 2019, 10(1):4958.
[138] DEKHIL O, SHALABY A, SOLIMAN A, et al. Identifying brain areas correlated with ADOSraw scores by studying altered dynamic functional connectivity patterns[J]. Medical ImageAnalysis, 2021, 68: 101899.
[139] FRISTON K J. Functional and effective connectivity: A review[J]. Brain connectivity, 2011, 1(1): 13-36.
[140] MÜLLER R A, SHIH P, KEEHN B, et al. Underconnected, but how? A survey of functionalconnectivity MRI studies in autism spectrum disorders[J]. Cerebral cortex, 2011, 21(10): 2233-2243.
[141] DI MARTINO A, YAN C G, LI Q, et al. The autism brain imaging data exchange: Towardsa large-scale evaluation of the intrinsic brain architecture in autism[J]. Molecular psychiatry,2014, 19(6): 659-667.
[142] ABRAHAM A, PEDREGOSA F, EICKENBERG M, et al. Machine learning for neuroimagingwith scikit-learn[J]. Frontiers in neuroinformatics, 2014, 8: 14.
[143] CRADDOCK C, SIKKA S, CHEUNG B, et al. Towards Automated Analysis of Connectomes:The Configurable Pipeline for the Analysis of Connectomes (C-PAC)[J]. Frontiers in Neuroin-formatics, 2013(42).
[144] DESIKAN R S, SÉGONNE F, FISCHL B, et al. An automated labeling system for subdividingthe human cerebral cortex on MRI scans into gyral based regions of interest[J]. Neuroimage,2006, 31(3): 968-980.
[145] RANE P, COCHRAN D, HODGE S M, et al. Connectivity in autism: A review of MRI con-nectivity studies[J]. Harvard review of psychiatry, 2015, 23(4): 223-244.
[146] WAINWRIGHT M J. High-dimensional statistics: A non-asymptotic viewpoint: Vol. 48[M].Cambridge university press, 2019.
[147] POWELL J L, STOCK J H, STOKER T M. Semiparametric estimation of index coefficients[J]. Econometrica: Journal of the Econometric Society, 1989: 1403-1430.
[148] ICHIMURA H. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models[J]. Journal of econometrics, 1993, 58(1-2): 71-120.
[149] HARDLE W, HALL P, ICHIMURA H. Optimal smoothing in single-index models[J]. Theannals of Statistics, 1993, 21(1): 157-178.
[150] XIA Y, TONG H, LI W K, et al. An adaptive estimation of dimension reduction space[J]. Journalof the Royal Statistical Society Series B: Statistical Methodology, 2002, 64(3): 363-410.
[151] YIN X, COOK R D. Direction estimation in single-index regressions[J]. Biometrika, 2005, 92(2): 371-384.
[152] KUCHIBHOTLA A K, PATRA R K. Efficient estimation in single index models throughsmoothing splines[J]. Bernoulli, 2020, 26(2): 1587 - 1618.
[153] ZHU L, HUANG M, LI R. Semiparametric quantile regression with high-dimensional covari-ates[J]. Statistica Sinica, 2012, 22(4): 1379.
[154] ZHONG W, ZHU L, LI R, et al. Regularized quantile regression and robust feature screeningfor single index models[J]. Statistica Sinica, 2016, 26(1): 69.
[155] KLOCK T, LANTERI A, VIGOGNA S.Estimating multi-index models with response-conditional least squares[J]. Electronic Journal of Statistics, 2021, 15(1): 589 - 629.
[156] SHENG W, YIN X. Direction estimation in single-index models via distance covariance[J].Journal of Multivariate Analysis, 2013, 122: 148-161.
[157] SZÉKELY G J, RIZZO M L, BAKIROV N K. Measuring and testing dependence by correlationof distances[J]. The Annals of Statistics, 2007, 35(6): 2769 - 2794.
[158] SZÉKELY G J, RIZZO M L. Brownian distance covariance[J]. The annals of applied statistics,2009: 1236-1265.
[159] GRETTON A, BOUSQUET O, SMOLA A, et al. Measuring statistical dependence withHilbert-Schmidt norms[C]//International conference on algorithmic learning theory. Springer,2005: 63-77.
[160] ZHANG N, YIN X. Direction estimation in single-index regressions via Hilbert-Schmidt inde-pendence criterion[J]. Statistica Sinica, 2015: 743-758.
[161] LIU J, XU P, LIAN H. Estimation for single-index models via martingale difference divergence[J]. Computational Statistics & Data Analysis, 2019, 137: 271-284.
[162] SHAO X, ZHANG J. Martingale difference correlation and its use in high-dimensional variablescreening[J]. Journal of the American Statistical Association, 2014, 109(507): 1302-1318.
[163] LIU J, ZHANG R, ZHAO W, et al. A robust and efficient estimation method for single indexmodels[J]. Journal of Multivariate Analysis, 2013, 122: 226-238.
[164] FENG L, ZOU C, WANG Z. Rank-based inference for the single-index model[J]. Statistics &Probability Letters, 2012, 82(3): 535-541.
[165] BINDELE H F, ABEBE A, MEYER K N. General rank-based estimation for regression singleindex models[J]. Annals of the Institute of Statistical Mathematics, 2018, 70: 1115-1146.
[166] CROUX C, FILZMOSER P, FRITZ H. Robust sparse principal component analysis[J]. Tech-nometrics, 2013, 55(2): 202-214.
[167] HAN F, LIU H. ECA: High-dimensional elliptical component analysis in non-Gaussian distri-butions[J]. Journal of the American Statistical Association, 2018, 113(521): 252-268.
[168] CHEN X, ZHANG J, ZHOU W. High-Dimensional Elliptical Sliced Inverse Regression in Non-Gaussian Distributions[J]. Journal of Business & Economic Statistics, 2022, 40(3): 1204-1215.
[169] DONG Y, YU Z, ZHU L. Robust inverse regression for dimension reduction[J]. Journal ofMultivariate Analysis, 2015, 134: 71-81.
[170] WENLIANG PAN H Z H Z, Xueqin Wang, ZHU J. Ball Covariance: A Generic Measure ofDependence in Banach Space[J]. Journal of the American Statistical Association, 2020, 115(529): 307-317.
[171] ZHANG J, CHEN X. Robust sufficient dimension reduction via ball covariance[J]. Computa-tional Statistics & Data Analysis, 2019, 140: 144-154.
[172] HALL P, LI K C. On almost linearity of low dimensional projections from high dimensionaldata[J]. The annals of Statistics, 1993: 867-889.
[173] ZHOU T, ZHU L. Cumulative divergence revisited[J]. Scientia Sinica Mathematica, 2021, 51.
[174] CHOI K, MARDEN J. A multivariate version of Kendall’s 𝜏[J]. Journal of NonparametricStatistics, 1998, 9(3): 261-293.
[175] MARDEN J I. Some robust estimates of principal components[J]. Statistics & probabilityletters, 1999, 43(4): 349-359.
[176] LI R, ZHONG W, ZHU L. Feature screening via distance correlation learning[J]. Journal of theAmerican Statistical Association, 2012, 107(499): 1129-1139.
[177] WU R, CHEN X. MM algorithms for distance covariance based sufficient dimension reduc-tion and sufficient variable selection[J]. Computational Statistics & Data Analysis, 2021, 155:107089.
[178] SEGAL M R, DAHLQUIST K D, CONKLIN B R. Regression approaches for microarray dataanalysis[J]. Journal of Computational Biology, 2003, 10(6): 961-980.
[179] NGUYEN T, SANNER S. Algorithms for direct 0–1 loss optimization in binary classification[C]//International conference on machine learning. PMLR, 2013: 1085-1093.

所在学位评定分委会
数学
国内图书分类号
O212.1
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/765626
专题南方科技大学
理学院_统计与数据科学系
推荐引用方式
GB/T 7714
He SD. DIMENSION REDUCTION METHODS FOR COMPLEX DATA ANALYSIS[D]. 深圳. 南方科技大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12031307-何帅达-统计与数据科学(15225KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[何帅达]的文章
百度学术
百度学术中相似的文章
[何帅达]的文章
必应学术
必应学术中相似的文章
[何帅达]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。