中文版 | English
题名

对异质性肿瘤样本的基因表达数据反卷积计算方法的综合评价

其他题名
COMPREHENSIVE EVALUATION OF CELL-TYPE DECONVOLUTION METHODS FOR GENE EXPRESSION PROFILES OF HETEROGENEOUS TUMOR SAMPLES
姓名
学号
11849328
学位类型
硕士
学位专业
计算机技术领域工程
导师
郭昕
论文答辩日期
2020-06-02
论文提交日期
2020-07-20
学位授予单位
哈尔滨工业大学
学位授予地点
深圳
摘要
根据世界卫生组织(WHO)2018年的报告,癌症是全球第二大死亡原因,在世界范围内至少导致960万人的死亡。近年来,癌症的发病率和死亡人数都在不断增加。与此同时,在机器学习和人工智能领域,随着深度学习等新技术的不断发展,人工智能算法已逐渐在生物信息学领域崭露头角。人们已经迫不及待利用计算机技术来研究生命的现象和规律。癌症是由一系列复杂的基因突变所引发的疾病,因而在基因层面研究癌症对我们了解肿瘤的发生和发展机制,以及对癌症进行早期筛查、诊断和治疗都具有重大的意义。本课题利用三种反卷积计算方法:NNLS、CIBERSORT和XGBoost,对Dream Challenges社区放出的最新RNA-seq人类基因表达数据集DS389、DS488进行反卷积计算,得到不同细胞类型的构成比例。通过对比计算结果和真实数据的相关性发现,CIBERSORT方法的计算结果准确度最高、XGBoost方法次之、NNLS方法最低。值得一提的是,这是XGBoost方法第一次用于基因表达数据的反卷积计算。基于深度学习技术,我们开发了一款神经网络评价模型,可以为数据推荐反卷积计算方法。实验结果表明,这款基于深度学习的评价模型可以显著提高数据反卷积计算的准确度。最后,我们将实验过程中实现的反卷积计算方法和评价模型封装成一个R包cellsorter,放在Github平台上,以便其他研究者使用并继续研发。
其他摘要
According to the annal report by World Health Organization (WHO) in 2018, cancer is the second leading cause of death globally, and is responsible for an estimated 9.6 million deaths that year. In recent years, both the number of new cancer cases and deaths are increasing. At the same time, deep learning technologies, as the newest trends in machine learning and artificial intelligence, have brought revolutionary advances in many research domains, including the field of bioinformatics. People can't wait to use computer technology to study the phenomena and laws of life.Cancer is a genetic disease initiated by genetic mutations and progressed by an accumulation of genomic aberrations. Therefore, cancer genomics can provide important insights into not only carcinogenesis and cancer progression, but also cancer detection, diagnosis and treatment. In this paper, we exploit three deconvolution methods: NNLS, CIBERSORT and XGBoost to perform deconvolution on the latest RNA-seq human gene expression datasets DS389 and DS488, released by the Dream Challenges community to obtain the proportion of different cell types. By comparing the correlation between the deconvolution results and the real data, we reveal that the deconvolution results of the CIBERSORT method have the highest accuracy, the XGBoost method is the second, and the NNLS method is the lowest. It is worth mentioning that this is the first time that the XGBoost is used for gene-expression-based deconvolution. Based on deep learning technology, we developed a novel neural network evaluation model that can recommend deconvolution methods for each gene expression input profile. The results suggest that this deep-learning-based evaluation model can significantly improve the accuracy of the deconvolution results. Finally, we developed an open-source R package, named cellsorter, which consists of three deconvolution methods and the novel evaluation mode. The R package is shared at the Github platform for other researchers to use and to further imporve.
关键词
其他关键词
语种
中文
培养类别
联合培养
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/142752
专题创新创业学院
作者单位
南方科技大学
推荐引用方式
GB/T 7714
樊磊. 对异质性肿瘤样本的基因表达数据反卷积计算方法的综合评价[D]. 深圳. 哈尔滨工业大学,2020.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
对异质性肿瘤样本的基因表达数据反卷积计算(4144KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[樊磊]的文章
百度学术
百度学术中相似的文章
[樊磊]的文章
必应学术
必应学术中相似的文章
[樊磊]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。