中文版 | English
题名

CAUSAL INFERENCE FOR SURVIVAL OUTCOMES WITH MULTIPLE CAUSES

其他题名
多原因因果推断在生存分析中的应用
姓名
姓名拼音
XIE Yutao
学号
12032848
学位类型
硕士
学位专业
0701 数学
学科门类/专业学位类别
07 理学
导师
徐匆
导师单位
统计与数据科学系
论文答辩日期
2022-05-07
论文提交日期
2022-06-21
学位授予单位
南方科技大学
学位授予地点
深圳
摘要

Causal inference is widely used in modern statistical research. By introducing causal inference methods to survival analysis, we are able to obtain the causal relationship between variables. In this thesis, we introduce the background, history and basic concepts of causal inference and survival analysis, respectively. In traditional causal inference, randomized controlled trial is the golden standard to claim causality. For observational data, four assumptions need to be made to get causal relationships. The most important one is the conditional exchangeability assumption, which requires that there is no unobserved confounder. It is a very strong assumption and is untestable. The deconfounder algorithm enables us to perform causal inference on observational data based on a weaker assumption - assuming there is no unobserved single-cause confounder. The algorithm first captures the joint distribution of the observed causes with a latent variable model, and then infer the value of the latent variable for each individual. The inferred latent variable would be used as a substitute for unobserved confounders in subsequent models to perform causal inferences. In this thesis, we propose the multi-cause Cox regression model which connects the deconfounder algorithm and traditional Cox model. While the regression coefficients in traditional Cox model only describe the statistical association between covariates and hazard rate, the regression coefficients in our multi-cause Cox regression model more closely represent the causal relationship. The performance of the proposed model is demonstrated by studies on a simulated dataset as well as a real dataset. Compared with the traditional Cox model, we find that the prediction accuracy by using the multi-cause Cox regression model is higher. This indicates that the multi-cause Cox regression model is able to reduce the negative impact of unobserved multi-cause confounders and capture the causal relationships more accurately in survival data.

其他摘要

在现代统计研究中,因果推断的分析方法被广泛应用。在分析生存数据的过程 中通过引入因果推断的分析方法可以得到变量间的因果关系。在本文中,我们分别 介绍了因果推断和生存分析的背景,历史和基本概念。在传统因果推断中,随机对 照实验是获得因果关系的金标准,而对于观测数据,需要基于四个假设才能通过对 数据的直接分析得到因果关系,其中一个重要的假设是条件可交换性,即假设不存 在未被观测到的混淆因子,此假设很强并且无法检验其合理性。使用 deconfounder 算法使得我们能够基于较弱的假设 (不存在未被观测到的单原因混淆因子),对观 测数据进行分析得到因果关系。此算法是利用潜变量模型捕捉所观测到的因子间 的联合分布,然后使用该潜变量模型为每个个体推断潜变量的值,再将推断得到 的潜变量作为未观测混淆因子的替代加入到后续模型中,得到因果关系。具体而 言,在本文中我们提出了结合 deconfounder 算法与 Cox 模型的多原因 Cox 回归模 型,对比传统 Cox 模型中回归系数仅描述协变量与生存风险在统计上的相关关系, 此模型得到的回归系数则更接近协变量与生存风险的因果关系。最后,我们分别 在模拟数据和真实数据中运用该模型,并与传统 Cox 模型的结果进行对比,发现 多原因 Cox 回归模型的预测精度较高,这说明我们提出的模型能够降低未观测到 的多原因混淆因子带来的负面影响,更准确地捕捉生存数据中蕴含的因果关系。

关键词
其他关键词
语种
英语
培养类别
独立培养
入学年份
2020
学位授予年份
2022-07
参考文献列表

[1] GUO R, CHENG L, LI J, et al. A survey of learning causality with data: Problems and methods[J]. ACM Computing Surveys (CSUR), 2020, 53(4): 1-37.
[2] HOLLAND P W. Statistics and causal inference[J]. Journal of the American statistical Association, 1986, 81(396): 945-960.
[3] WRIGHT S. Correlation and causation[J]. Journal of Agricultural Research, 1921, 20: 557-585.
[4] RUBIN D B. Estimating causal effects of treatments in randomized and nonrandomized studies.[J]. Journal of educational Psychology, 1974, 66(5): 688.
[5] PEARL J, et al. Models, reasoning and inference[J]. Cambridge, UK: CambridgeUniversityPress, 2000, 19: 675-685.
[6] CARTWRIGHT N. Are rcts the gold standard?[J]. BioSocieties, 2007, 2(1): 11-20.
[7] KAPLAN E L, MEIER P. Nonparametric estimation from incomplete observations[J]. Journalof the American statistical association, 1958, 53(282): 457-481.
[8] COX D R. Regression models and life-tables[J]. Journal of the Royal Statistical Society: SeriesB (Methodological), 1972, 34(2): 187-202.
[9] COLLETT D. Modelling survival data in medical research[M]. [S.l.]: CRC press, 2015.
[10] HANAGAL D D. Modeling survival data using frailty models[M]. [S.l.]: Springer, 2011.
[11] MORGAN S L, WINSHIP C. Counterfactuals and causal inference[M]. [S.l.]: CambridgeUniversity Press, 2015.
[12] PEARL J. Causal inference in statistics: An overview[J]. Statistics surveys, 2009, 3: 96-146.
[13] ANGRIST J, IMBENS G. Identification and estimation of local average treatment effects[M].[S.l.]: National Bureau of Economic Research Cambridge, Mass., USA, 1995.
[14] SCHWARTZ S, GATTO N M, CAMPBELL U B. Extending the sufficient component causemodel to describe the stable unit treatment value assumption (sutva)[J]. Epidemiologic Perspectives & Innovations, 2012, 9(1): 1-11.
[15] STUART E A. Matching methods for causal inference: A review and a look forward[J]. Statistical science: a review journal of the Institute of Mathematical Statistics, 2010, 25(1): 1.
[16] VANDERWEELE T J. Concerning the consistency assumption in causal inference[J]. Epidemiology, 2009, 20(6): 880-883.
[17] CHEN P Y, TSIATIS A A. Causal inference on the difference of the restricted mean lifetimebetween two groups[J]. Biometrics, 2001, 57(4): 1030-1038.
[18] BRESLOW N E. Contribution to discussion of paper by dr cox[J]. J. Roy. Statist. Soc., Ser. B,1972, 34: 216-217.
[19] LEUNG K M, ELASHOFF R M, AFIFI A A. Censoring issues in survival analysis[J]. Annualreview of public health, 1997, 18(1): 83-104.
[20] RANGANATH R, PEROTTE A. Multiple causal inference with latent confounding[J]. arXivpreprint arXiv:1805.08273, 2018.
[21] WANG Y, BLEI D M. The blessings of multiple causes[J]. Journal of the American StatisticalAssociation, 2019, 114(528): 1574-1596.
[22] IMBENS G W, RUBIN D B. Causal inference in statistics, social, and biomedical sciences[M].[S.l.]: Cambridge University Press, 2015.
[23] IMAI K, JIANG Z. Comment: The challenges of multiple causes[J]. Journal of the AmericanStatistical Association, 2019, 114(528): 1605-1610.
[24] ROSENBAUM P R, RUBIN D B. The central role of the propensity score in observationalstudies for causal effects[J]. Biometrika, 1983, 70(1): 41-55.
[25] TIPPING M E, BISHOP C M. Probabilistic principal component analysis[J]. Journal of theRoyal Statistical Society: Series B (Statistical Methodology), 1999, 61(3): 611-622.
[26] HOFFMAN M D. Poisson-uniform nonnegative matrix factorization[C]//2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). [S.l.]: IEEE, 2012:5361-5364.
[27] HOFFMAN M D, BLEI D M, WANG C, et al. Stochastic variational inference[J]. Journal ofMachine Learning Research, 2013.
[28] ROBERT C, CASELLA G. Monte carlo statistical methods[M]. [S.l.]: Springer Science &Business Media, 2013.
[29] IMBENS G W. Potential outcome and directed acyclic graph approaches to causality: Relevance for empirical practice in economics[J]. Journal of Economic Literature, 2020, 58(4):1129-79.
[30] HARRELL F E, CALIFF R M, PRYOR D B, et al. Evaluating the yield of medical tests[J].Jama, 1982, 247(18): 2543-2546.
[31] FAWCETT T. An introduction to roc analysis[J]. Pattern recognition letters, 2006, 27(8): 861-874.
[32] CAETANO S, SONPAVDE G, POND G. C-statistic: a brief explanation of its construction,interpretation and limitations[J]. European Journal of Cancer, 2018, 90: 130-132.
[33] SCHMID M, WRIGHT M N, ZIEGLER A. On the use of harrell’s c for clinical risk predictionvia random survival forests[J]. Expert Systems with Applications, 2016, 63: 450-459.
[34] SALMERÓN R, GARCÍA C, GARCÍA J. Variance inflation factor and condition number inmultiple linear regression[J]. Journal of Statistical Computation and Simulation, 2018, 88(12):2365-2384.
[35] PIPONI D, MOORE D, DILLON J V. Joint distributions for tensorflow probability[J]. arXivpreprint arXiv:2001.11819, 2020.
[36] WĘGLARCZYK S. Kernel density estimation and its application[C]//ITM Web of Conferences:volume 23. [S.l.]: EDP Sciences, 2018: 00037.
[37] WANG Z, BOVIK A C. Mean squared error: Love it or leave it? a new look at signal fidelitymeasures[J]. IEEE signal processing magazine, 2009, 26(1): 98-117.
[38] LOPRINZI C L, LAURIE J A, WIEAND H S, et al. Prospective evaluation of prognostic variables from patient-completed questionnaires. north central cancer treatment group.[J]. Journalof Clinical Oncology, 1994, 12(3): 601-607.
[39] BUCCHERI G, FERRIGNO D, TAMBURINI M. Karnofsky and ecog performance status scoring in lung cancer: a prospective, longitudinal study of 536 patients from a single institution[J].European journal of cancer, 1996, 32(7): 1135-1141.
[40] YATES J W, CHALMER B, MCKEGNEY F P. Evaluation of patients with advanced cancerusing the karnofsky performance status[J]. Cancer, 1980, 45(8): 2220-2224.
[41] RICH J T, NEELY J G, PANIELLO R C, et al. A practical guide to understanding kaplan-meiercurves[J]. Otolaryngology—Head and Neck Surgery, 2010, 143(3): 331-336.
[42] AKAIKE H. A new look at the statistical model identification[J]. IEEE transactions on automatic control, 1974, 19(6): 716-723.

所在学位评定分委会
统计与数据科学系
国内图书分类号
O212.2
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/336381
专题理学院_统计与数据科学系
推荐引用方式
GB/T 7714
Xie YT. CAUSAL INFERENCE FOR SURVIVAL OUTCOMES WITH MULTIPLE CAUSES[D]. 深圳. 南方科技大学,2022.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12032848-谢宇涛-统计与数据科学(1716KB)学位论文--限制开放CC BY-NC-SA请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[谢宇涛]的文章
百度学术
百度学术中相似的文章
[谢宇涛]的文章
必应学术
必应学术中相似的文章
[谢宇涛]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。