中文版 | English
题名

面向机器学习模型可解释性的反事实样本生成

其他题名
GENERATION OF COUNTERFACTUAL SAMPLES FOR THE INTERPRETABILITY OF MACHINE LEARNING MODELS
姓名
姓名拼音
YUAN Yidong
学号
12132910
学位类型
硕士
学位专业
0701 数学
学科门类/专业学位类别
07 理学
导师
徐匆
导师单位
统计与数据科学系
论文答辩日期
2023-04-28
论文提交日期
2023-06-26
学位授予单位
南方科技大学
学位授予地点
深圳
摘要
反事实解释是机器学习可解释方法的一种,通过生成一组反事实样本,使其达到预期的输出来解释模型的预测。目前的反事实解释方法:生成模型和优化模型,都是利用结构因果模型中的结构方程来保留并解释变量间的关系,以获得可行的反事实样本。但在现实中,完整的结构因果模型很难获取。本文首先在两个已知完整结构因果模型的数据集上,假设错误的结构因果模型,使用反事实可解释方法中常用的评价指标,以研究其对反事实解释方法的影响。实验发现,生成的反事实样本的几种评价指标皆有不同程度的下降,其中用于衡量反事实样本的可行性的因果保留得分受影响程度最大。其次,我们发现现有反事实解释方法在数据特征维度较高时,无法快速、准确地生成反事实样本,且无法处理多分类问题。针对现有方法在无法获取正确的结构因果模型的情况下效果下降的问题,本文通过构造一种基于生成对抗网络的判别器的近似可行性约束以更好地保留变量间的因果关系,进而提高所生成的反事实样本的因果保留得分。最后,针对多分类问题,本文基于生成对抗网络的生成器构建了一种新的反事实样本生成方法。实验结果表明,此方法生成的样本能够满足可行性等条件,在各种评价指标下都表现出色。
关键词
语种
中文
培养类别
独立培养
入学年份
2021
学位授予年份
2023-06
参考文献列表

[1] Islam, Aylin Caliskan and Bryson, Joanna J. and Narayanan, Arvind. Semantics derived automatically from language corpora necessarily contain human biases. 2016.
[2] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations withoutopening the black box: Automated decisions and the gpdr. Harv. JL Tech., 31:841, 2017.
[3] J. Angwin, J. Larson, L. Kirchner, and S. Mattu. Machine bias.https://www.propublica.org/article/machine- bias- risk- assessments- in- criminal- sentencing ,Mar 2019
[4] B. Goodman and S. Flaxman. European union regulations on algorithmic decision-making anda right to explanation. AI Magazine, 38(3):50–57, 2017.
[5] D. Boyd and K. Crawford. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5):662–679,2012.
[6] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014.
[7] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method tofool deep neural networks. CoRR, abs/1511.04599, 2015.
[8] N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitationsof deep learning in adversarial settings. CoRR, abs/1511.07528, 2015.
[9] Z. C. Lipton. The mythos of model interpretability. CoRR, abs/1606.03490, 2016.
[10] F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. arXivpreprint arXiv:1702.08608, 2017.
[11] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal. Explaining explanations:An overview of interpretability of machine learning. In 2018 IEEE 5th International Conferenceon Data Science and Advanced Analytics (DSAA), pages 8089. IEEE, 2018.
[12] W. Cheng, Y. Shen, L. Huang, and Y. Zhu. Incorporating interpretability into latent factormodels via fast influence analysis. In Proceedings of the 25th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining, pages 885–893. ACM, 2019.
[13] M. Du, N. Liu, and X. Hu. Techniques for interpretable machine learning. arXiv preprintarXiv:1808.00033, 2018.
[14] S. Wachter, B. D. Mittelstadt, and C. Russell. Counterfactual explanations without opening theblack box: Automated decisions and the GDPR. CoRR, abs/1711.00399, 2017.
[15] S. Liu, B. Kailkhura, D. Loveland, and Y. Han. Generative counterfactual introspection forexplainable deep learning. CoRR, abs/1907.03077, 2019.
[16] R. M. Grath, L. Costabello, C. L. Van, P. Sweeney, F. Kamiab, Z. Shen, and F. Le ́cué. Interpretable credit application predictions with counterfactual explanations. CoRR, abs/1811.05245, 2018.
[17] Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, and S. Lee. Counterfactual visual explanations.CoRR, abs/1904.07451, 2019.
[18] Y. Goyal, U. Shalit, and B. Kim. Explaining classifiers with causal concept effect (cace). CoRR,abs/1907.07165, 2019.
[19] J. Moore, N. Hammerla, and C. Watkins. Explaining deep learning models with constrainedadversarial examples. CoRR, abs/1906.10671, 2019.
[20] R. K. Mothilal, A. Sharma, and C. Tan. Explaining machine learning classifiers through diversecounterfactual explanations. CoRR, abs/1905.07697, 2019.
[21] Divyat Mahajan, Chenhao Tan, and Amit Sharma. Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277, 2019.
[22] Duong T D , Li Q , Xu G . Prototype-based Counterfactual Explanation for Causal Classification[J]. 2021.
[23] Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel Das. Explanations based on the missing: Towards contrastive explanationswith pertinent negatives. In Advances in Neural Information Processing Systems, pages 592–603, 2018.
[24] Ramaravind Kommiya Mothilal, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the ACM FATconference (to appear), 2020.
[25] Chris Russell. Efficient search for diverse coherent explanations. In Proceedings of FAT, 2019.
[26] Judea Pearl. Causality. Cambridge University Press, 2009.
[27] Spirtes P , Zhang K . Causal discovery and inference: concepts and recent methodologicaladvances[J]. Applied Informatics, 2016, 3(1):1-28.
[28] Glymour C , Zhang K , Spirtes P . Review of Causal Discovery Methods Based on GraphicalModels[J]. Frontiers in Genetics, 2019, 10:-.
[29] Alessandro Magrini, Stefano Di Blasi, and Federico Mattia Stefanini. A conditional linear gaussian network to assess the impact of several agronomic settings on the quality of tuscan sangiovese grapes. Biometrical Letters, 2017.
[30] Goodfellow I J , Pouget-Abadie J , Mirza M , et al. Generative Adversarial Networks[J]. 2014.
[31] Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. 2020. Algorithmic Recourse:from Counterfactual Explanations to Interventions.
[32] Amir-Hossein Karimi, Julius von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. Algorithmic recourse under imperfect causal knowledge: a proba-bilistic approach.
[33] Shubham Sharma, Jette Henderson, and Joydeep Ghosh. Certifai: A common framework to provide explanations and analyse the fairness and ro- bustness of black-box models. In Proceedingsof the AAAI/ACM Conference on AI, Ethics, and Society, pages 166–172, 2020.
[34] Karimi A H , Barthe G , Belle B , et al. Model-Agnostic Counterfactual Explanations for Consequential Decisions[J]. 2019.
[35] Ustun B , Spangher A , Liu Y . Actionable Recourse in Linear Classification:,10.1145/3287560.3287566[P]. 2018.

所在学位评定分委会
数学
国内图书分类号
O212.1
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/543954
专题理学院_统计与数据科学系
推荐引用方式
GB/T 7714
袁宜东. 面向机器学习模型可解释性的反事实样本生成[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12132910-袁宜东-统计与数据科学(3556KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[袁宜东]的文章
百度学术
百度学术中相似的文章
[袁宜东]的文章
必应学术
必应学术中相似的文章
[袁宜东]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。