中文版 | English
题名

To what extent do DNN-based image classification models make unreliable inferences?

作者
发表日期
2021-09-01
DOI
发表期刊
ISSN
1382-3256
EISSN
1573-7616
卷号26期号:5页码:1-40
摘要

Deep Neural Network (DNN) models are widely used for image classification. While they offer high performance in terms of accuracy, researchers are concerned about if these models inappropriately make inferences using features irrelevant to the target object in a given image. To address this concern, we propose a metamorphic testing approach that assesses if a given inference is made based on irrelevant features. Specifically, we propose two metamorphic relations (MRs) to detect such unreliable inferences. These relations expect (a) the classification results with different labels or the same labels but less certainty from models after corrupting the relevant features of images, and (b) the classification results with the same labels after corrupting irrelevant features. The inferences that violate the metamorphic relations are regarded as unreliable inferences. Our evaluation demonstrated that our approach can effectively identify unreliable inferences for single-label classification models with an average precision of 64.1% and 96.4% for the two MRs, respectively. As for multi-label classification models, the corresponding precision for MR-1 and MR-2 is 78.2% and 86.5%, respectively. Further, we conducted an empirical study to understand the problem of unreliable inferences in practice. Specifically, we applied our approach to 18 pre-trained single-label image classification models and 3 multi-label classification models, and then examined their inferences on the ImageNet and COCO datasets. We found that unreliable inferences are pervasive. Specifically, for each model, more than thousands of correct classifications are actually made using irrelevant features. Next, we investigated the effect of such pervasive unreliable inferences, and found that they can cause significant degradation of a model’s overall accuracy. After including these unreliable inferences from the test set, the model’s accuracy can be significantly changed. Therefore, we recommend that developers should pay more attention to these unreliable inferences during the model evaluations. We also explored the correlation between model accuracy and the size of unreliable inferences. We found the inferences of the input with smaller objects are easier to be unreliable. Lastly, we found that the current model training methodologies can guide the models to learn object-relevant features to certain extent, but may not necessarily prevent the model from making unreliable inferences. We encourage the community to propose more effective training methodologies to address this issue.

关键词
相关链接[Scopus记录]
收录类别
SCI ; EI
语种
英语
学校署名
其他
WOS记录号
WOS:000663327900002
EI入藏号
20212510526684
EI主题词
Classification (Of Information) ; Deep Neural Networks
EI分类号
Information Theory And Signal Processing:716.1
ESI学科分类
COMPUTER SCIENCE
Scopus记录号
2-s2.0-85108082989
来源库
Scopus
引用统计
被引频次[WOS]:13
成果类型期刊论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/230145
专题工学院_计算机科学与工程系
作者单位
1.Department of Computer Science and Engineering,The Hong Kong University of Science and Technology,Hong Kong
2.Department of Computer Science,Rutgers University,Piscataway,United States
3.School of Cyber Science and Engineering,Huazhong University of Science and Technology,Wuhan,China
4.Department of Computer Science and Engineering,Southern University of Science and Technology,Shenzhen,China
5.Department of Computer Science,Purdue University,West Lafayette,United States
推荐引用方式
GB/T 7714
Tian,Yongqiang,Ma,Shiqing,Wen,Ming,et al. To what extent do DNN-based image classification models make unreliable inferences?[J]. Empirical Software Engineering,2021,26(5):1-40.
APA
Tian,Yongqiang,Ma,Shiqing,Wen,Ming,Liu,Yepang,Cheung,Shing Chi,&Zhang,Xiangyu.(2021).To what extent do DNN-based image classification models make unreliable inferences?.Empirical Software Engineering,26(5),1-40.
MLA
Tian,Yongqiang,et al."To what extent do DNN-based image classification models make unreliable inferences?".Empirical Software Engineering 26.5(2021):1-40.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Tian,Yongqiang]的文章
[Ma,Shiqing]的文章
[Wen,Ming]的文章
百度学术
百度学术中相似的文章
[Tian,Yongqiang]的文章
[Ma,Shiqing]的文章
[Wen,Ming]的文章
必应学术
必应学术中相似的文章
[Tian,Yongqiang]的文章
[Ma,Shiqing]的文章
[Wen,Ming]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。