南方科技大学知识苑(SUSTech KC): 面向机器学习模型可解释性的反事实样本生成

题名	面向机器学习模型可解释性的反事实样本生成
其他题名	GENERATION OF COUNTERFACTUAL SAMPLES FOR THE INTERPRETABILITY OF MACHINE LEARNING MODELS
姓名	袁宜东
姓名拼音	YUAN Yidong
学号	12132910
学位类型	硕士
学位专业	0701 数学
学科门类/专业学位类别	07 理学
导师	徐匆
导师单位	统计与数据科学系
论文答辩日期	2023-04-28
论文提交日期	2023-06-26
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	反事实解释是机器学习可解释方法的一种，通过生成一组反事实样本，使其达到预期的输出来解释模型的预测。目前的反事实解释方法：生成模型和优化模型，都是利用结构因果模型中的结构方程来保留并解释变量间的关系，以获得可行的反事实样本。但在现实中，完整的结构因果模型很难获取。本文首先在两个已知完整结构因果模型的数据集上，假设错误的结构因果模型，使用反事实可解释方法中常用的评价指标，以研究其对反事实解释方法的影响。实验发现，生成的反事实样本的几种评价指标皆有不同程度的下降，其中用于衡量反事实样本的可行性的因果保留得分受影响程度最大。其次，我们发现现有反事实解释方法在数据特征维度较高时，无法快速、准确地生成反事实样本，且无法处理多分类问题。针对现有方法在无法获取正确的结构因果模型的情况下效果下降的问题，本文通过构造一种基于生成对抗网络的判别器的近似可行性约束以更好地保留变量间的因果关系，进而提高所生成的反事实样本的因果保留得分。最后，针对多分类问题，本文基于生成对抗网络的生成器构建了一种新的反事实样本生成方法。实验结果表明，此方法生成的样本能够满足可行性等条件，在各种评价指标下都表现出色。
关键词	反事实解释反事实样本可行性结构因果模型生成对抗网络
语种	中文
培养类别	独立培养
入学年份	2021
学位授予年份	2023-06
参考文献列表	[1] Islam, Aylin Caliskan and Bryson, Joanna J. and Narayanan, Arvind. Semantics derived automatically from language corpora necessarily contain human biases. 2016. [2] Sandra Wachter, Brent Mittelstadt, and Chris Russell. Counterfactual explanations withoutopening the black box: Automated decisions and the gpdr. Harv. JL Tech., 31:841, 2017. [3] J. Angwin, J. Larson, L. Kirchner, and S. Mattu. Machine bias.https://www.propublica.org/article/machine- bias- risk- assessments- in- criminal- sentencing ,Mar 2019 [4] B. Goodman and S. Flaxman. European union regulations on algorithmic decision-making anda right to explanation. AI Magazine, 38(3):50–57, 2017. [5] D. Boyd and K. Crawford. Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, communication & society, 15(5):662–679,2012. [6] I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572, 2014. [7] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard. Deepfool: a simple and accurate method tofool deep neural networks. CoRR, abs/1511.04599, 2015. [8] N. Papernot, P. D. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami. The limitationsof deep learning in adversarial settings. CoRR, abs/1511.07528, 2015. [9] Z. C. Lipton. The mythos of model interpretability. CoRR, abs/1606.03490, 2016. [10] F. Doshi-Velez and B. Kim. Towards a rigorous science of interpretable machine learning. arXivpreprint arXiv:1702.08608, 2017. [11] L. H. Gilpin, D. Bau, B. Z. Yuan, A. Bajwa, M. Specter, and L. Kagal. Explaining explanations:An overview of interpretability of machine learning. In 2018 IEEE 5th International Conferenceon Data Science and Advanced Analytics (DSAA), pages 8089. IEEE, 2018. [12] W. Cheng, Y. Shen, L. Huang, and Y. Zhu. Incorporating interpretability into latent factormodels via fast influence analysis. In Proceedings of the 25th ACM SIGKDD InternationalConference on Knowledge Discovery & Data Mining, pages 885–893. ACM, 2019. [13] M. Du, N. Liu, and X. Hu. Techniques for interpretable machine learning. arXiv preprintarXiv:1808.00033, 2018. [14] S. Wachter, B. D. Mittelstadt, and C. Russell. Counterfactual explanations without opening theblack box: Automated decisions and the GDPR. CoRR, abs/1711.00399, 2017. [15] S. Liu, B. Kailkhura, D. Loveland, and Y. Han. Generative counterfactual introspection forexplainable deep learning. CoRR, abs/1907.03077, 2019. [16] R. M. Grath, L. Costabello, C. L. Van, P. Sweeney, F. Kamiab, Z. Shen, and F. Le ́cué. Interpretable credit application predictions with counterfactual explanations. CoRR, abs/1811.05245, 2018. [17] Y. Goyal, Z. Wu, J. Ernst, D. Batra, D. Parikh, and S. Lee. Counterfactual visual explanations.CoRR, abs/1904.07451, 2019. [18] Y. Goyal, U. Shalit, and B. Kim. Explaining classifiers with causal concept effect (cace). CoRR,abs/1907.07165, 2019. [19] J. Moore, N. Hammerla, and C. Watkins. Explaining deep learning models with constrainedadversarial examples. CoRR, abs/1906.10671, 2019. [20] R. K. Mothilal, A. Sharma, and C. Tan. Explaining machine learning classifiers through diversecounterfactual explanations. CoRR, abs/1905.07697, 2019. [21] Divyat Mahajan, Chenhao Tan, and Amit Sharma. Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277, 2019. [22] Duong T D , Li Q , Xu G . Prototype-based Counterfactual Explanation for Causal Classification[J]. 2021. [23] Amit Dhurandhar, Pin-Yu Chen, Ronny Luss, Chun-Chen Tu, Paishun Ting, Karthikeyan Shanmugam, and Payel Das. Explanations based on the missing: Towards contrastive explanationswith pertinent negatives. In Advances in Neural Information Processing Systems, pages 592–603, 2018. [24] Ramaravind Kommiya Mothilal, Amit Sharma, and Chenhao Tan. Explaining machine learning classifiers through diverse counterfactual explanations. In Proceedings of the ACM FATconference (to appear), 2020. [25] Chris Russell. Efficient search for diverse coherent explanations. In Proceedings of FAT, 2019. [26] Judea Pearl. Causality. Cambridge University Press, 2009. [27] Spirtes P , Zhang K . Causal discovery and inference: concepts and recent methodologicaladvances[J]. Applied Informatics, 2016, 3(1):1-28. [28] Glymour C , Zhang K , Spirtes P . Review of Causal Discovery Methods Based on GraphicalModels[J]. Frontiers in Genetics, 2019, 10:-. [29] Alessandro Magrini, Stefano Di Blasi, and Federico Mattia Stefanini. A conditional linear gaussian network to assess the impact of several agronomic settings on the quality of tuscan sangiovese grapes. Biometrical Letters, 2017. [30] Goodfellow I J , Pouget-Abadie J , Mirza M , et al. Generative Adversarial Networks[J]. 2014. [31] Amir-Hossein Karimi, Bernhard Schölkopf, and Isabel Valera. 2020. Algorithmic Recourse:from Counterfactual Explanations to Interventions. [32] Amir-Hossein Karimi, Julius von Kügelgen, Bernhard Schölkopf, and Isabel Valera. 2020. Algorithmic recourse under imperfect causal knowledge: a proba-bilistic approach. [33] Shubham Sharma, Jette Henderson, and Joydeep Ghosh. Certifai: A common framework to provide explanations and analyse the fairness and ro- bustness of black-box models. In Proceedingsof the AAAI/ACM Conference on AI, Ethics, and Society, pages 166–172, 2020. [34] Karimi A H , Barthe G , Belle B , et al. Model-Agnostic Counterfactual Explanations for Consequential Decisions[J]. 2019. [35] Ustun B , Spangher A , Liu Y . Actionable Recourse in Linear Classification:,10.1145/3287560.3287566[P]. 2018.
所在学位评定分委会	数学
国内图书分类号	O212.1
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/543954
专题	理学院_统计与数据科学系
推荐引用方式 GB/T 7714	袁宜东. 面向机器学习模型可解释性的反事实样本生成[D]. 深圳. 南方科技大学,2023.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12132910-袁宜东-统计与数据科学（3556KB）	--	--	限制开放	--	请求全文