题名 | Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets |
作者 | |
通讯作者 | Hu,Guangwu |
发表日期 | 2021-09-01
|
DOI | |
发表期刊 | |
ISSN | 0167-4048
|
EISSN | 1872-6208
|
卷号 | 108 |
摘要 | Phishing websites belong to a social engineering attack where perpetrators fake legitimate websites to lure people to access so as to illegally acquire user's identity, password, privacy and even properties. This attack imposes a great threat to people and becomes more and more severe. In order to identify phishing websites, many proposals have shown their merits. For example, the classical proposal CNN-LSTM received a very high precision by combining Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) together. However, despite CNN achieved great success in AI area, LSTM still exists the biases issue since it always treats the later features much more important than the former ones. In the meanwhile, as the self-attention mechanism can discover the text's inner dependency relationships, it has been widely applied to various tasks of deep learning-based Natural Language Processing (NLP). If we treat a URL as a text string, this mechanism can learn comprehensive URL representations. In order to improve the accuracy for phishing websites detection further, in this paper, we propose a novel Convolutional Neural Network (CNN) with self-attention named self-attention CNN for phishing Uniform Resource Locators (URLs) identification. Specifically, self-attention CNN first leverages Generative Adversarial Network (GAN) to generate phishing URLs so as to balance the datasets of legitimate and phishing URLs. Then it utilizes CNN and multi-head self-attention to construct our new classifier which is comprised of four blocks, namely the input block, the attention block, the feature block and the output block. Finally, the trained classifier can give a high-accuracy result for an unknown website URL. Overall thorough experiments indicate that self-attention CNN achieves 95.6% accuracy, which outperforms CNN-LSTM, single CNN and single LSTM by 1.4%, 4.6% and 2.1% respectively. |
关键词 | |
相关链接 | [Scopus记录] |
收录类别 | |
语种 | 英语
|
学校署名 | 其他
|
资助项目 | National Key Research and Development Program of China["2018YFB1800204","2018YFB1800601"]
; National Natural Science Foundation of China[61972219,61771273]
; Natural Science Foundation of Guangdong Province[2021A1515012640]
; R&D Program of Shenzhen["JCYJ20190813174403598","SGDX20190918101201696","JCYJ20190813165003837"]
|
WOS研究方向 | Computer Science
|
WOS类目 | Computer Science, Information Systems
|
WOS记录号 | WOS:000677639500010
|
出版者 | |
EI入藏号 | 20212710578501
|
EI主题词 | Computer crime
; Convolution
; Long short-term memory
; Natural language processing systems
|
EI分类号 | Information Theory and Signal Processing:716.1
; Data Processing and Image Processing:723.2
|
ESI学科分类 | COMPUTER SCIENCE
|
Scopus记录号 | 2-s2.0-85108874331
|
来源库 | Scopus
|
引用统计 |
被引频次[WOS]:31
|
成果类型 | 期刊论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/230141 |
专题 | 南方科技大学 工学院_计算机科学与工程系 |
作者单位 | 1.Shenzhen International Graduate School,Tsinghua University,China 2.Peng Cheng Laboratory,Shenzhen,China 3.School of Computer Science,Shenzhen Institute of Information Technology,Shenzhen,China 4.Southern University of Science and Technology,Shenzhen,China |
推荐引用方式 GB/T 7714 |
Xiao,Xi,Xiao,Wentao,Zhang,Dianyan,et al. Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets[J]. COMPUTERS & SECURITY,2021,108.
|
APA |
Xiao,Xi.,Xiao,Wentao.,Zhang,Dianyan.,Zhang,Bin.,Hu,Guangwu.,...&Xia,Shutao.(2021).Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets.COMPUTERS & SECURITY,108.
|
MLA |
Xiao,Xi,et al."Phishing websites detection via CNN and multi-head self-attention on imbalanced datasets".COMPUTERS & SECURITY 108(2021).
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论