题名 | Deep Neural Networks on Genetic Motif Discovery: the Interpretability and Identifiability Issues |
姓名 | |
姓名拼音 | ZHANG Yu
|
学号 | 11756001
|
学位类型 | 博士
|
学位专业 | 计算机科学
|
导师 | |
导师单位 | 计算机科学与工程系
|
外机构导师 | Peter Tino
|
外机构导师单位 | 伯明翰大学
|
论文答辩日期 | 2022-03-30
|
论文提交日期 | 2022-07-01
|
学位授予单位 | 伯明翰大学
|
学位授予地点 | 伯明翰
|
摘要 | Deep neural networks have made great success in a wide range of research fields and real-world applications. However, as a black-box model, the drastic advances in the performance come at the cost of model interpretability. This becomes a big concern especially for domains that are safety-critical or have ethical and legal requirements (e.g., avoiding algorithmic discrimination). In other situations, interpretability might be able to help scientists gain new ``knowledge'' that is learnt by the neural networks (e.g., computational genomics), and neural network based genetic motif discovery is such a field. It naturally leads us to another question: Can current neural network based motif discovery methods identify the underlying motifs from the data? How robust and reliable is it? In other words, we are interested in the motif identifiability problem. In this thesis, we first conduct a comprehensive review of the current neural network interpretability research, and propose a novel unified taxonomy which, to the best of our knowledge, provides the most comprehensive and clear categorisation of the existing approaches. Then we formally study the motif identifiability problem in the context of neural network based motif discovery (i.e., if we only have access to the predictive performance of a neural network, which is a black-box, how well can we recover the underlying ``true'' motifs by interpreting the learnt model). Systematic controlled experiments show that although accurate models tend to recover the underlying motifs better, the motif identifiability (a measure of the similarity between true motifs and learnt motifs) still varies in a large range. Also, the over-complexity (without overfitting) of a high-accuracy model (e.g., using 128 kernels while 16 kernels are already good enough) may be harmful to the motif identifiability. We thus propose a robust neural network based motif discovery workflow addressing above issues, which is verified on both synthetic and real-world datasets. Finally, we propose probabilistic kernels in place of conventional convolutional kernels and study whether it would be better to directly learn probabilistic motifs in the neural networks rather than post hoc interpretation. Experiments show that although probabilistic kernels have some merits (e.g., stable output), their performance is not comparable to classic convolutional kernels under the same network setting (the number of kernels). |
关键词 | |
语种 | 英语
|
培养类别 | 联合培养
|
入学年份 | 2017
|
学位授予年份 | 2022-07
|
参考文献列表 | [1] Julius Adebayo et al. “Sanity Checks for Saliency Maps”. Advances in Neural Information Processing Systems. Vol. 31. 2018. |
来源库 | 人工提交
|
成果类型 | 学位论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/347870 |
专题 | 工学院_计算机科学与工程系 |
推荐引用方式 GB/T 7714 |
Zhang Y. Deep Neural Networks on Genetic Motif Discovery: the Interpretability and Identifiability Issues[D]. 伯明翰. 伯明翰大学,2022.
|
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | 操作 | |
11756001-张宇-计算机科学与工程(11669KB) | -- | -- | 限制开放 | -- | 请求全文 |
个性服务 |
原文链接 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
导出为Excel格式 |
导出为Csv格式 |
Altmetrics Score |
谷歌学术 |
谷歌学术中相似的文章 |
[张宇]的文章 |
百度学术 |
百度学术中相似的文章 |
[张宇]的文章 |
必应学术 |
必应学术中相似的文章 |
[张宇]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论