中文版 | English
题名

QUANTILE REGRESSION FOR ANALYZING CORRELATED DATA IN ULTRA-HIGH DIMENSION

其他题名
超高维相关数据的分位数回归及其推断
姓名
姓名拼音
LIANG Yakun
学号
12031265
学位类型
博士
学位专业
0701 数学
学科门类/专业学位类别
07 理学
导师
蒋学军
导师单位
统计与数据科学系
论文答辩日期
2024-05-22
论文提交日期
2024-06-19
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

Data plays a crucial role in driving statistical development. Statistics is dedicated to developing effective methods for data mining,  information extraction, and theoretical support. However, the rapid progress in science and technology intensifies the complexity of data characteristics, including ultrahigh dimensionality, strong correlations, non-normality, outliers, and incomplete data. These characteristics bring new challenges and opportunities to the development of statistics and data science. Notably, the prevalence of intercorrelation in high-dimensional data significantly impacts the stability of traditional variable selection methods. Consequently, high-dimensional statistical inference methods relying on penalized estimation may fail. To address these challenges, robust statistical methods are urgently required. This thesis focuses on the ultrahigh-dimensional statistical analysis and investigates the issues of feature screening, variable selection, and hypothesis testing within quantile regression models. By exploring the theoretical foundations and practical methods of quantile regression models in ultrahigh-dimensional data analysis, this thesis aim to provide new strategies and insights for handling complex data.

Feature screening is a valuable approach for reducing data dimensionality. To reduce ultrahigh-dimensional correlated features to a manageable dimension, this thesis introduces a novel approach called the Quantile Ridge Regression (QRR) screener. Ridge regression, known for its collinearity-friendly penalized estimation, serves as a bridge to feature screening. By establishing the Bahadur representation of the QRR estimator, this thesis demonstrates that the QRR screener possesses the sure screening property, ensuring that it can correctly screen all important variables with a probability tending to one in ultrahigh-dimensional settings. To address the computational challenges posed by the non-smoothness of the quantile loss function, this thesis applies the accelerated and generalized Alternating Direction Method of Multipliers (ADMM) to efficiently solve the ultrahigh-dimensional quantile ridge regression problem. Numerical simulation results underscore the superiority of the QRR screener over existing marginal or conditional measure-based screening methods, exhibiting a higher accuracy screening rate for a given model size. Moreover, the proposed screening method is not limited to traditional linear regression problems but can also be applied to survival analysis. By leveraging the inverse probability weighting method, the censored QRR screener outperforms existing methods in handling right censored data.

Variable selection is essential for achieving unbiased estimation. Building upon the successful application of the QRR screener, this thesis further proposes a two-stage variable selection method, namely QRR-LIPS, relied on the likelihood-based discrepancy measure. This method accurately and efficiently identifies important variables through implementing nested signal detection on ranked features. However, inactive variables strongly correlated with important variables can easily lead to a placeholder effect, interfering with the accuracy of variable selection. To address this issue, this thesis innovatively introduces a greedy algorithm based on forward search and competition mechanism to enhance the vanilla QRR-LIPS method. This improvement endows the variable selection process with an error-correction mechanism, effectively preventing overfitting and underfitting in extreme cases. Theoretically, this thesis proves the strong selection consistency of both the vanilla and enhanced versions of the QRR-LIPS. Numerical simulation results show that, the two-stage QRR-LIPS methods exhibit significant computational efficiency and accurate variable selection ability compared to penalized methods based on either the full feature space or the reduced space after screening. Furthermore, the enhanced QRR-LIPS performs more robustly under dependent design and can effectively cope with complex correlation structures in data. Notably, the proposed method extends beyond ordinary linear regression, offering applicability to semiparametric and nonparametric regression models. By leveraging the B-spline approximation method, the proposed two-stage methods can effectively address the high-dimensional varying coefficient models and high-dimensional additive models with correlated data, outperforming existing methods in finite samples.

Hypothesis testing is the theoretical guarantee of statistical inference. To establish rigorous and reliable testing procedures, this thesis proposes a Wald-type testing framework for linear structures of parameters of interest within high-dimensional regression models. In the estimation process, this thesis employs a convolution-smoothed quantile regression framework to effectively solve the non-smoothness issue of the empirical loss function. To ensure valid inference, this thesis adopts the idea of sample splitting, performs model selection on one subset, parameter estimation on another subset, and utilizes the cross-fitting method to combine statistics from these subsets. However, without Oracle property, the presence of spurious and redundant parameters often leads to biased estimation in the covariance matrix, affecting the power of the test. To address this challenge, this thesis proposes an information-weighted fused estimator. This novel method adequately combines the information from two subsamples through matrix weighting, thereby reducing estimation bias. Theoretically, this thesis proves the asymptotic normality of the fused estimator and derives the limiting distribution of the Wald-type test statistic based on this fused estimator under the null hypothesis and local alternative hypothesis. Empirical results show that the proposed test statistic can control the Type I error and achieve high testing power under a series of linear hypotheses.

Data analysis serves as the ultimate orientation of statistical research. To demonstrate the practical applicability of the proposed methods, this thesis applies the series of proposed quantile regression methods for ultrahigh-dimensional correlated data to the feature screening, variable selection, and targeted inference analysis of ultrahigh-dimensional anticancer drug sensitivity genes.The outcomes of our analysis align with existing biomedical literature, providing valuable insights for in-depth understanding of drug action mechanisms and the optimization of clinical treatment strategies.

其他摘要

数据是统计学科发展的源动力。统计学致力于开发挖掘数据、提炼信息的有效方法,并为之提供扎实的理论支持。随着科技的迅猛发展,各个领域的数据呈现出了诸多复杂特性,如超高维、强相关、非正态分布、离群值存在、数据不完整等现象。这些特性为统计与数据科学的发展带来了新的机遇与挑战。高维数据中普遍存在的相关性,使得传统变量选择方法的稳定性受到了严重影响,进而可能导致广泛依赖于惩罚估计的高维统计推断方法失效。面对这些不太理想的数据,亟需更加稳健的统计方法予以支撑。本文主要针对超高维数据的背景,研究分位数回归模型的特征筛选、变量选择、以及假设检验等问题。通过深入探讨分位数回归模型在超高维数据分析中的理论基础和实践方法,旨在为处理复杂数据提供新的思路和策略。

特征筛选是缩减数据的可靠途径。为了将超高维相关性数据降低到可操作的维度,本文提出了分位数岭回归(Quantile Ridge Regression, 记为QRR)筛选器。岭回归是对共线性友好的惩罚估计,架起通往特征筛选的桥梁。通过建立QRR估计的Bahadur表达式,本文证明了QRR筛选器具有确定筛选性质(Sure Screening Property)。这一性质保证了在超高维情形下,该筛选器能够以趋于一的概率正确筛选所有重要变量。为了解决分位损失函数的不可微带来的计算挑战,本文应用广义加速的交替方向乘子法(Alternating Direction Method of Multipliers, 记为ADMM)来高效地求解超高维分位数岭回归问题。数值模拟结果表明,与现有的边际或条件度量的筛选方法相比,QRR筛选器展现出了明显的优越性。其在给定模型大小下具有较高的准确筛选率。本文所提出的筛选方法不仅仅局限于传统的线性回归问题,还能够被巧妙地应用于生存分析领域。通过利用逆概率加权的方法,所提出的筛选方法能够有效地处理删失数据,并在现有方法中展现出优势。

变量选择是准确估计的必由之路。在成功应用QRR筛选器后,本文进一步提出了一种基于似然差异度量的变量选择方法(Likelihood-Based Post-Screening Selection Algorithm, 记为LIPS),与QRR配套形成两阶段变量的QRR-LIPS方法。该方法通过对排序的特征实现嵌套式地信号检测,能够准确、快速地分隔出重要变量。然而,与重要变量具有强相关性的虚假变量容易产生占位效应,干扰变量选择的准确性。为了解决这一问题,本文创新性地引入了前向搜索和竞争机制相结合的贪婪算法,对QRR-LIPS方法进行了改进。这一改进赋予了变量选择过程纠错机制,防止了极端情况下的欠拟合和过拟合现象的发生。理论上,本文证明了基础版和改进版QRR-LIPS方法的变量选择强相合性。数值模拟结果表明,与基于全特征空间或筛选后降维空间的惩罚方法相比,两阶段QRR-LIPS方法展现出明显的计算效率和准确的选择变量的能力。此外,改进版的QRR-LIPS方法在相依型设计矩阵下的表现更稳健,能够有效应对数据的复杂相关结构。本文所提出的两阶段QRR-LIPS方法不仅仅局限于简单的线性回归问题,还能扩展到半参数和非参数回归模型中。通过利用B样条近似的方法,所提出的变量选择方法能够有效解决相关数据下高维变系数模型和高维可加模型的变量选择问题,并在有限样本下优于现有方法。

假设检验是统计推断的理论保障。为了构建严谨可靠的检验方法,本文针对高维回归模型中感兴趣参数的线性结构,提出了Wald型检验方法。在估计过程中,本文采用了卷积平滑的分位数回归框架,巧妙地解决经验损失函数的不可微问题。为了确保有效的推断,本文借鉴了样本分割的思想,在一个子集上进行模型选择,在另一个子集上进行参数估计,并利用交叉拟合方法来结合统计量。然而,在缺乏Oracle性质的估计下,冗余参数的存在往往造成协方差矩阵的估计偏差,进而影响检验统计量的功效。为了应对这一挑战,本文提出了一种基于信息加权的融合估计方法。该方法通过矩阵加权方法充分结合了两个子样本的信息,有效地减少了估计偏差。理论上,本文证明了融合估计的渐近正态性,推导了建立在该估计上的Wald型检验统计量在零假设和局部备择假设下的渐近分布。数值模拟结果表明,所提出的检验统计量在一系列线性假设下能够控制第一类错误和并且具有较高的检验功效。

数据分析是统计研究的最终导向。为了展示所提出方法的实际应用价值,本文将提出的一系列面向超高维相关数据的分位数回归方法,运用到对超高维抗癌药物敏感性基因的选择和目标基因的推断分析中。分析结果与现有的生物医学文献相吻合,为深入理解药物作用机制和优化临床治疗方案提供了宝贵的见解。

关键词
其他关键词
语种
英语
培养类别
独立培养
入学年份
2020
学位授予年份
2024-06
参考文献列表

[1]HOCKING R R, LESLIE R. Selection of the Best Subset in Regression Analysis[J]. Techno-metrics, 1967, 9(4): 531-540.
[2]BARRON A, BIRGÉ L, MASSART P. Risk Bounds for Model Selection Via Penalization[J].Probability theory and related fields, 1999, 113: 301-413.
[3]NATARAJAN B K. Sparse Approximate Solutions to Linear Systems[J]. SIAM Journal onComputing, 1995, 24(2): 227-234.
[4]FOSTER D, KARLOFF H, THALER J.Variable Selection is Hard[C]//GRüNWALD P,HAZAN E, KALE S. Proceedings of Machine Learning Research: Vol. 40Proceedings ofThe 28th Conference on Learning Theory. Paris, France: PMLR, 2015: 696-709.
[5]KOHAVI R, JOHN G H. Wrappers for Feature Subset Selection[J]. Artificial Intelligence,1997, 97(1-2): 273-324.
[6]MANIKANDAN G, SUSI E, ABIRAMI S. Feature Selection on High Dimensional Data UsingWrapper Based Subset Selection[C]//2017 Second International Conference on Recent Trendsand Challenges in Computational Models (ICRTCCM). IEEE, 2017: 320-325.
[7]ROTARI M, KULAHCI M. Variable Selection Wrapper in Presence of Correlated Input Vari-ables for Random Forest Models[J]. Quality and Reliability Engineering International, 2024,40(1): 297-312.
[8]WANG H. Forward Regression for Ultra-High Dimensional Variable Screening[J]. Journal ofthe American Statistical Association, 2009, 104(488): 1512-1524.
[9]CHENG M Y, HONDA T, ZHANG J T. Forward Variable Selection for Sparse Ultra-HighDimensional Varying Coefficient Models[J]. Journal of the American Statistical Association,2016, 111(515): 1209-1221.
[10]ZHONG W, DUAN S, ZHU L. Forward Additive Regression for Ultrahigh-Dimensional Non-parametric Additive Models[J]. Statistica Sinica, 2020, 30(1): 175-192.
[11]HONDA T, LIN C T. Forward Variable Selection for Ultra-High Dimensional Quantile Re-gression Models[J]. Annals of the Insitute of Statistical Mathematics, 2022: 1-32.
[12]ZHAO P, YU B. On Model Selection Consistency of Lasso[J]. The Journal of Machine LearningResearch, 2006, 7: 2541-2563.
[13]ZHANG T. On the Consistency of Feature Selection using Greedy Least Squares Regression[J]. Journal of Machine Learning Research, 2009, 10(19): 555-568.
[14]WIECZOREK J, LEI J. Model Selection Properties of Forward Selection and Sequential Cross-Validation for High-Dimensional Regression[J]. Canadian Journal of Statistics, 2022, 50(2):454-470.
[15]ING C K, LAI T L. A Stepwise Regression Method and Consistent Model Selection for High-Dimensional Sparse Linear Models[J]. Statistica Sinica, 2011: 1473-1513.
[16]ZHANG T. Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with LinearModels[C]//Advances in Neural Information Processing Systems. 2008.
[17]BREIMAN L. Better Subset Regression Using the Nonnegative Garrote[J]. Technometrics,1995, 37(4): 373-384.
[18]TIBSHIRANI R. Regression Shrinkage and Selection Via the Lasso[J]. Journal of the RoyalStatistical Society Series B-Statistical Methodology, 1996, 58(1): 267-288.
[19]FAN J, LI R. Variable Selection Via Nonconcave Penalized Likelihood and Its Oracle Properties[J]. Journal of the American Statistical Association, 2001, 96(456): 1348-1360.
[20]ZHANG C H. Nearly Unbiased Variable Selection under Minimax Concave Penalty[J]. Annalsof Statistics, 2010, 38(2): 894-942.
[21]VAN DE GEER S A, BÜHLMANN P. On the Conditions Used to Prove Oracle Results for theLasso[J]. Electronic Journal of Statistics, 2009, 3: 1360-1392.
[22]FAN J, PENG H. Nonconcave Penalized Likelihood with a Diverging Number of Parameters[J]. The Annals of Statistics, 2004, 32(3): 928-961.
[23]FAN Y, LV J. Asymptotic Equivalence of Regularization Methods in Thresholded ParameterSpace[J]. Journal of the American Statistical Association, 2013, 108(503): 1044-1061.
[24]ZOU H. The Adaptive Lasso and Its Oracle Properties[J]. Journal of the American statisticalassociation, 2006, 101(476): 1418-1429.
[25]ZOU H, HASTIE T. Regularization and Variable Selection Via the Elastic Net[J]. Journal ofthe Royal Statistical Society Series B-Statistical Methodology, 2005, 67(2): 301-320.
[26]GRAVE E, OBOZINSKI G R, BACH F. Trace Lasso: A Trace Norm Regularization for Cor-related Designs[C]//Advances in Neural Information Processing Systems. 2011.
[27]TAKADA M, SUZUKI T, FUJISAWA H. Independently Interpretable Lasso: A New Regular-izer for Sparse Regression with Uncorrelated Variables[C]//International Conference on Artifi-cial Intelligence and Statistics. PMLR, 2018: 454-463.
[28]WANG H, LENGERICH B J, ARAGAM B, et al. Precision Lasso: Accounting for Correlationsand Linear Dependencies in High-Dimensional Genomic Data[J]. Bioinformatics, 2019, 35(7):1181-1187.
[29]ZHU W, LÉVY-LEDUC C, TERNÈS N. A Variable Selection Approach for Highly CorrelatedPredictors in High-Dimensional Genomic Data[J]. Bioinformatics, 2021, 37(16): 2238-2244.
[30]FAN J, LV J. Sure Independence Screening for Ultrahigh Dimensional Feature Space[J]. Journalof the Royal Statistical Society Series B-Statistical Methodology, 2008, 70(5): 849-883.
[31]HALL P, MILLER H. Using Generalized Correlation to Effect Variable Selection in Very HighDimensional Problems[J]. Journal of Computational and Graphical Statistics, 2009, 18(3): 533-550.
[32]ZHU L P, LI L, LI R, et al. Model-Free Feature Screening for Ultrahigh-Dimensional Data[J].Journal of the American Statistical Association, 2011, 106(496): 1464-1475.
[33]LI R, ZHONG W, ZHU L. Feature Screening via Distance Correlation Learning[J]. Journal ofthe American Statistical Association, 2012, 107(499): 1129-1139.
[34]SHAO X, ZHANG J.Martingale Difference Correlation and Its Use in High-DimensionalVariable Screening[J]. Journal of the American Statistical Association, 2014, 109(507): 1302-1318.
[35]MA X, ZHANG J. Robust Model-Free Feature Screening Via Quantile Correlation[J]. Journalof Multivariate Analysis, 2016, 143: 472-480.
[36]BARUT E, FAN J, VERHASSELT A. Conditional Sure Independence Screening[J]. Journalof the American Statistical Association, 2016, 111(515): 1266-1277.
[37]MA S, LI R, TSAI C L. Variable Screening via Quantile Partial Correlation[J]. Journal of theAmerican Statistical Association, 2017, 112(518): 650-663.
[38]LIU W, KE Y, LIU J, et al. Model-Free Feature Screening and Fdr Control with KnockoffFeatures[J]. Journal of the American Statistical Association, 2022, 117(537): 428-443.
[39]SZÉKELY G J, RIZZO M L. The Energy of Data and Distance Correlation[M]. Chapman andHall/CRC, 2023.
[40]KOENKER R, BASSETT G. Regression Quantiles[J]. Econometrica, 1978, 46(1): 33-50.
[41]JIA J, ROHE K. Preconditioning to Comply with the Irrepresentable Condition[A]. 2012.arXiv: 1208.5584.
[42]WANG H. Factor Profiled Sure Independence Screening[J]. Biometrika, 2012, 99(1): 15-28.
[43]ZHANG C H, ZHANG S S. Confidence Intervals for Low Dimensional Parameters in HighDimensional Linear Models[J]. Journal of the Royal Statistical Society Series B-StatisticalMethodology, 2014, 76(1): 217-242.
[44]VAN DE GEER S, BUEHLMANN P, RITOV Y, et al. On Asymptotically Optimal ConfidenceRegions and Tests for High-Dimensional Models[J]. Annals of Statistics, 2014, 42(3): 1166-1202.
[45]NING Y, LIU H. A General Theory of Hypothesis Tests and Confidence Regions for SparseHigh Dimensional Models[J]. Annals of Statistics, 2017, 45(1): 158-195.
[46]SUN Q, ZHANG H. Targeted Inference Involving High-Dimensional Data Using Nuisance Pe-nalized Regression[J]. Journal of the American Statistical Association, 2021, 116(535): 1472-1486.
[47]GUO B, CHEN S X. Tests for High Dimensional Generalized Linear Models[J]. Journal of theRoyal Statistical Society Series B: Statistical Methodology, 2016, 78(5): 1079-1102.
[48]ZHANG X, CHENG G. Simultaneous Inference for High-Dimensional Linear Models[J]. Jour-nal of the American Statistical Association, 2017, 112(518): 757-768.
[49]TANG Y, WANG H J, BARUT E. Testing for the Presence of Significant Covariates throughConditional Marginal Regression[J]. Biometrika, 2018, 105(1): 57-71.
[50]TANG Y, WANG Y, WANG H J, et al. Conditional Marginal Test for High Dimensional QuantileRegression[J]. Statistica Sinica, 2022, 32(2): 869-892.
[51]CHEN Z, CHENG V X, LIU X. Hypothesis Testing on High Dimensional Quantile Regression[J]. Journal of Econometrics, 2024, 238(1): 105525.
[52]SHI C, SONG R, CHEN Z, et al. Linear Hypothesis Testing for High Dimensional GeneralizedLinear Models[J]. Annals of Statistics, 2019, 47(5): 2671-2703.
[53]ZHAO T, KOLAR M, LIU H. A General Framework for Robust Testing and Confidence Re-gions in High-Dimensional Quantile Regression[A]. 2015. arXiv: 1412.8724.
[54]BRADIC J, KOLAR M. Uniform inference for high-dimensional quantile regression: linearfunctionals and regression rank scores[A]. 2017. arXiv: 1702.06209.
[55]CHENG C, FENG X, HUANG J, et al. Regularized Projection Score Estimation of TreatmentEffects in High-Dimensional Quantile Regression[J]. Statistica Sinica, 2022, 32(1): 23-41.
[56]YAN Y, WANG X, ZHANG R.Confidence Intervals and Hypothesis Testing for High-Dimensional Quantile Regression: Convolution Smoothing and Debiasing[J]. Journal of Ma-chine Learning Research, 2023, 24(245): 1-49.
[57]FERNANDES M, GUERRE E, HORTA E. Smoothing Quantile Regressions[J]. Journal ofBusiness & Economic Statistics, 2021, 39(1): 338-357.
[58]RASINES D G, YOUNG G A. Splitting Strategies for Post-Selection Inference[J]. Biometrika,2023, 110(3): 597-614.
[59]COX D R. A Note on Data-Splitting for the Evaluation of Significance Levels[J]. Biometrika,1975, 62(2): 441-444.
[60]WASSERMAN L, ROEDER K. High-Dimensional Variable Selection[J]. Annals of Statistics,2009, 37(5A): 2178-2201.
[61]SHI C, SONG R, LU W, et al. Statistical Inference for High-Dimensional Models via RecursiveOnline-Score Estimation[J]. Journal of the American Statistical Association, 2021, 116(535):1307-1318.
[62]MEINSHAUSEN N, MEIER L, BÜHLMANN P. P-Values for High-Dimensional Regression[J]. Journal of the American Statistical Association, 2009, 104(488): 1671-1681.
[63]SCHEETZ T E, KIM K Y A, SWIDERSKI R E, et al. Regulation of Gene Expression in theMammalian Eye and Its Relevance to Eye Disease[J]. Proceedings of the National Academy ofSciences of the United States of America, 2006, 103(39): 14429-14434.
[64]BARRETINA J, CAPONIGRO G, STRANSKY N, et al. The Cancer Cell Line Encyclopediaenables predictive modelling of anticancer drug sensitivity[J]. Nature, 2012, 483(7391): 603-607.
[65]BERMINGHAM M L, PONG-WONG R, SPILIOPOULOU A, et al. Application of High-Dimensional Feature Selection: Evaluation for Genomic Prediction in Man[J]. Scientific Re-ports, 2015, 5(1): 10312.
[66]LORBERT A, EIS D, KOSTINA V, et al. Exploiting Covariate Similarity in Sparse Regres-sion Via the Pairwise Elastic Net[C]//Proceedings of the Thirteenth International Conferenceon Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 2010:477-484.
[67]MKHADRI A, OUHOURANE M. An Extended Variable Inclusion and Shrinkage Algorithmfor Correlated Variables[J]. Computational Statistics & Data Analysis, 2013, 57(1): 631-644.
[68]MEINSHAUSEN N, ROCHA G, YU B.Discussion: A Tale of Three Cousins: Lasso,L2boosting and Dantzig[J]. The Annals of Statistics, 2007, 35(6): 2373-2384.
[69]HE X, WANG L, HONG H G. Quantile-Adaptive Model-Free Variable Screening for High-Dimensional Heterogeneous Data[J]. Annals of Statistics, 2013, 41(1): 342-369.
[70]WU Y, YIN G. Conditional Quantile Screening in Ultrahigh-Dimensional Heterogeneous Data[J]. Biometrika, 2015, 102(1): 65-76.
[71]KONG Y, LI Y, ZEROM D. Screening and Selection for Quantile Regression Using an Alterna-tive Measure of Variable Importance[J]. Journal of Multivariate Analysis, 2019, 173: 435-455.
[72]KOENKER R. Econometric Society Monographs: Quantile Regression[M]. Cambridge Uni-versity Press, 2005.
[73]KOENKER R, CHERNOZHUKOV V, HE X, et al. Handbook of Quantile Regression[M].Chapman and Hall/CRC, 2017.
[74]HOERL A, KENNARD R. Ridge Regression - Biased Estimation for Nonorthogonal Problems[J]. Technometrics, 1970, 12(1): 55-67.
[75]HASTIE T. Ridge Regularization: An Essential Concept in Data Science[J]. Technometrics,2020, 62(4): 426-433.
[76]BOYD S, PARIKH N, CHU E, et al. Distributed Optimization and Statistical Learning via theAlternating Direction Method of Multipliers[J]. Foundations and Trends® in Machine Learning,2011, 3(1): 1-122.
[77]BÜHLMANN P, VAN DE GEER S. Statistics for High-Dimensional Data: Methods, Theoryand Applications[M]. Springer Science & Business Media, 2011.
[78]WANG X, LENG C. High Dimensional Ordinary Least Squares Projection for Screening Vari-ables[J]. Journal of the Royal Statistical Society Series B-Statistical Methodology, 2016, 78(3):589-611.
[79]HASTIE T, MONTANARI A, ROSSET S, et al. Surprises in High-Dimensional Ridgeless LeastSquares Interpolation[J]. Annals of Statistics, 2022, 50(2): 949 - 986.
[80]HE J, KANG J. Prior Knowledge Guided Ultra-High Dimensional Variable Screening withApplication to Neuroimaging Data[J]. Statistica Sinica, 2022, 32(SI): 2095-2117.
[81]FANG E X, HE B, LIU H, et al. Generalized Alternating Direction Method of Multipliers: NewTheoretical Insights and Applications[J]. Mathematical Programming Computation, 2015, 7(2):149-187.
[82]BUCCINI A, DELL’ACQUA P, DONATELLI M. A General Framework for Admm Acceler-ation[J]. Numerical Algorithms, 2020, 85(3): 829-848.
[83]PORTNOY S. Censored Regression Quantiles[J]. Journal of the American Statistical Associa-tion, 2003, 98(464): 1001-1012.
[84]WANG H J, WANG L. Locally Weighted Censored Quantile Regression[J]. Journal of theAmerican Statistical Association, 2009, 104(487): 1117-1128.
[85]WANG H J, ZHOU J, LI Y. Variable Selection for Censored Quantile Regresion[J]. StatisticaSinica, 2013, 23(1): 145.
[86]FAN Y, TANG Y, ZHU Z.Variable Selection in Censored Quantile Regression with HighDimensional Data[J]. Science China Mathematics, 2018, 61(4): 641-658.
[87]PAN J, ZHANG S, ZHOU Y. Variable Screening for Ultrahigh Dimensional Censored QuantileRegression[J]. Journal of Statistical Computation and Simulation, 2019, 89(3): 395-413.
[88]KAPLAN E L, MEIER P. Nonparametric Estimation from Incomplete Observations[J]. Journalof the American statistical association, 1958, 53(282): 457-481.
[89]KALIA M. Biomarkers for Personalized Oncology: Recent Advances and Future Challenges[J]. Metabolism, 2015, 64(3): S16-S21.
[90]SEGAL M R, DAHLQUIST K D, CONKLIN B R. Regression Approaches for MicroarrayData Analysis[J]. Journal of Computational Biology, 2003, 10(6): 961-980.
[91]BALDING D J. A Tutorial on Statistical Methods for Population Association Studies[J]. Naturereviews genetics, 2006, 7(10): 781-791.
[92]FAN J, LI R, ZHANG C H, et al. Statistical Foundations of Data Science[M]. CRC press, 2020.
[93]HASTIE T, TIBSHIRANI R, TIBSHIRANI R. Best Subset, Forward Stepwise or Lasso? Anal-ysis and Recommendations Based on Extensive Comparisons[J]. Statistical Science, 2020, 35(4): 579-592.
[94]GUO Y, ZHU Z, FAN J. Best Subset Selection Is Robust against Design Dependence[A]. 2021.arXiv: 2007.01478.
[95]ZHU J, WEN C, ZHU J, et al. A Polynomial Algorithm for Best-Subset Selection Problem[J].Proceedings of the National Academy of Sciences, 2020, 117(52): 33117-33123.
[96]AKAIKE H. A New Look at the Statistical Model Identification[J]. IEEE transactions onautomatic control, 1974, 19(6): 716-723.
[97]SCHWARZ G. Estimating the Dimension of a Model[J]. The Annals of Statistics, 1978: 461-464.
[98]MALLOWS C L. Some Comments on Cp[J]. Technometrics, 2000, 42(1): 87-94.
[99]NEYMAN J, PEARSON E S. On the Use and Interpretation of Certain Test Criteria for Purposesof Statistical Inference: Part I[J]. Biometrika, 1928: 175-240.
[100] NEYMAN J, PEARSON E S. On the Use and Interpretation of Certain Test Criteria for Purposesof Statistical Inference: Part II[J]. Biometrika, 1928: 263-294.
[101] LEE E R, NOH H, PARK B U. Model Selection via Bayesian Information Criterion for QuantileRegression Models[J]. Journal of the American Statistical Association, 2014, 109(505): 216-229.
[102] WANG H, JIN H, JIANG X. Feature Selection for High-Dimensional Varying Coefficient Mod-els via Ordinary Least Squares Projection[J]. Communications in Mathematics and Statistics,2023: 1-42.
[103] WU Y, ZEN M. A strongly consistent information criterion for linear model selection based onM-estimation[J]. Probability Theory and Related Fields, 1999, 113(4): 599-625.
[104] KOENKER R, MACHADO J A. Goodness of Fit and Related Inference Processes for QuantileRegression[J]. Journal of the American Statistical Association, 1999, 94(448): 1296-1310.
[105] ZHOU T, ZHU L, XU C, et al. Model-Free Forward Screening Via Cumulative Divergence[J].Journal of the American Statistical Association, 2020, 115(531): 1393-1405.
[106] SHERWOOD B, LI S. Quantile Regression Feature Selection and Estimation with GroupedVariables Using Huber Approximation[J]. Statistics and Computing, 2022, 32(5): 75.
[107] SCHUMAKER L. Spline Functions: Basic Theory[M]. Cambridge university press, 2007.
[108] WANG H, JIN H, JIANG X, et al. Model Selection for High Dimensional NonparametricAdditive Models via Ridge Estimation[J]. Mathematics, 2022, 10(23): 4551.
[109] FAN J, FAN Y, BARUT E. Adaptive Robust Variable Selection[J]. Annals of Statistics, 2014,42(1): 324-351.
[110] ZEGGINI E, GLOYN A L, BARTON A C, et al.Translational Genomics and PrecisionMedicine: Moving from the Lab to the Clinic[J]. Science, 2019, 365(6460): 1409-1413.
[111] FAN J, GUO S, HAO N. Variance Estimation Using Refitted Cross-Validation in Ultrahigh Di-mensional Regression[J]. Journal of the Royal Statistical Society Series B-Statistical Method-ology, 2012, 74(1): 37-65.
[112] CAI L, GUO X, LI G, et al. Tests for High-Dimensional Single-Index Models[J]. ElectronicJournal of Statistics, 2023, 17(1): 429-463.
[113] GRAYBILL F A, DEAL R. Combining Unbiased Estimators[J]. Biometrics, 1959, 15(4):543-550.
[114] HALL W J. Efficiency of Weighted Averages[J]. Journal of Statistical Planning and Inference,2007, 137(11): 3548-3556.
[115] HE X, PAN X, TAN K M, et al. Smoothed Quantile Regression with Large-Scale Inference[J].Journal of Econometrics, 2023, 232(2): 367-388.
[116] CHERNOZHUKOV V, CHETVERIKOV D, DEMIRER M, et al. Double/debiased MachineLearning for Treatment and Structural Parameters[J]. Econometrics Journal, 2018, 21(1): C1-C68.
[117] GU J, CHEN S X. Distributed Statistical Inference under Heterogeneity[J]. Journal of MachineLearning Research, 2023, 24(387): 1-57.
[118] CHAMBERLAIN G, LEAMER E. Matrix Weighted Averages and Posterior Bounds[J]. Journalof the Royal Statistical Society Series B (Methodological), 1976, 38(1): 73-84.
[119] HAYNSWORTH E V. Applications of an Inequality for the Schur Complement[J]. Proceedingsof the American Mathematical Society, 1970, 24(3): 512-516.
[120] TAN K M, WANG L, ZHOU W X.High-Dimensional Quantile Regression: ConvolutionSmoothing and Concave Regularization[J]. Journal of the Royal Statistical Society Series B-Statistical Methodology, 2022, 84(1): 205-233.
[121] WEN J, YANG S, WANG C D, et al. Feature-Splitting Algorithms for Ultrahigh DimensionalQuantile Regression[J]. Journal of Econometrics, 2023: 105426.
[122] CHEN Z, FAN J, LI R. Error Variance Estimation in Ultrahigh-Dimensional Additive Models[J]. Journal of the American Statistical Association, 2018, 113(521): 315-327.
[123] YUAN P, FENG S, LI G. Revisiting Feature Selection for Linear Models with FDR and PowerGuarantees[J]. Journal of the Korean Statistical Society, 2022, 51(4): 1132-1160.
[124] CHUNG K L. A Course in Probability Theory[M]. Academic press, 2001.
[125] SERFLING R J. Approximation Theorems of Mathematical Statistics[M]. John Wiley & Sons,1980.
[126] BENTKUS V. A Lyapunov-Type Bound in Rd[J]. Theory of Probability and Its Applications,2004, 49(2): 311-323.
[127] BALART J. Recent Advances in Cancer Therapy: An Overview[J]. Current PharmaceuticalDesign, 2010, 16(1): 3-10.
[128] PADMA V V. An Overview of Targeted Cancer Therapy[J]. BioMedicine, 2015, 5: 1-6.
[129] KIM S, SUNDARESAN V, ZHOU L, et al. Integrating Domain Specific Knowledge and Net-work Analysis to Predict Drug Sensitivity of Cancer Cell Lines[J]. PloS one, 2016, 11(9):e0162173.
[130] ASHLEY E A. Towards Precision Medicine[J]. Nature Reviews Genetics, 2016, 17(9): 507-522.
[131] MARQUART J, CHEN E Y, PRASAD V. Estimation of the Percentage of US Patients withCancer Who Benefit from Genome-Driven Oncology[J]. JAMA oncology, 2018, 4(8): 1093-1098.
[132] ZOPPOLI G, REGAIRAZ M, LEO E, et al. Putative Dna/rna Helicase Schlafen-11 (slfn11)Sensitizes Cancer Cells to Dna-Damaging Agents[J]. Proceedings of the National Academy ofSciences of the United States of America, 2012, 109(37): 15030-15035.
[133] ZHAO Y, ZHANG J, TIAN Y, et al. Met Tyrosine Kinase Inhibitor, Pf-2341066, SuppressesGrowth and Invasion of Nasopharyngeal Carcinoma[J]. Drug Design, Development and Ther-apy, 2015, 9: 4897.

所在学位评定分委会
数学
国内图书分类号
O212.1
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/765680
专题南方科技大学
理学院_统计与数据科学系
推荐引用方式
GB/T 7714
Liang YK. QUANTILE REGRESSION FOR ANALYZING CORRELATED DATA IN ULTRA-HIGH DIMENSION[D]. 深圳. 南方科技大学,2024.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12031265-梁亚坤-统计与数据科学(3208KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[梁亚坤]的文章
百度学术
百度学术中相似的文章
[梁亚坤]的文章
必应学术
必应学术中相似的文章
[梁亚坤]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。