中文版 | English
题名

Utilizing Lexicon-enhanced Approach to Sensitive Information Identification

作者
DOI
发表日期
2022
ISBN
978-1-6654-9808-1
会议录名称
页码
1-6
会议日期
1-3 Sept. 2022
会议地点
Bristol, United Kingdom
摘要
Large-scale sensitive information leakage incidents have occurred frequently, causing huge impacts and losses to individuals, enterprises, and society. Most sensitive information exists in unstructured data, making it challenging for people to identify when it is leaked, an important cause of information leakage. Therefore, sensitive information identification from unstructured data has received extensive attention. In addition, the smallest unit of Chinese is a character, so its lexical boundary is flexible, which makes it very difficult to identify sensitive information in Chinese. It is worth mentioning that there are no publicly available datasets in this field of sensitive information identification due to the sensitivity. To address the above challenges, we first create the SPIDC (Sensitive Personal Information Dataset in Chinese) and release it as a public resource for related research. Second, we apply the existing sensitive information identification methods on the English datasets to the Chinese datasets. In addition, to solve the problem of uncertainty and ambiguity of Chinese vocabulary boundary, we apply three lexicon-enhanced technologies from NER (Named Entity Recognition) to the Chinese sensitive information identification for the first time. Experimental results on the SPIDC show that the lexicon-enhanced approach has better performance than other methods.
关键词
学校署名
其他
相关链接[IEEE记录]
来源库
IEEE
全文链接https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9911164
引用统计
被引频次[WOS]:0
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/406489
专题前沿与交叉科学研究院
工学院_计算机科学与工程系
作者单位
1.Department of Computer Science and Engineering, University of Warwick, Coventry, United Kingdom
2.Department of Computer Science, University of Warwick, Coventry, United Kingdom
3.Academy for Advanced Interdisciplinary Studies, Southern University of Science and Technology, Shenzhen, China
4.Shenzhen Key Laboratory of Future Industrial Internet Safety and Security, University of Warwick, Coventry, United Kingdom
推荐引用方式
GB/T 7714
Lihua Cai,Yujue Zhou,Yulong Ding,et al. Utilizing Lexicon-enhanced Approach to Sensitive Information Identification[C],2022:1-6.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Lihua Cai]的文章
[Yujue Zhou]的文章
[Yulong Ding]的文章
百度学术
百度学术中相似的文章
[Lihua Cai]的文章
[Yujue Zhou]的文章
[Yulong Ding]的文章
必应学术
必应学术中相似的文章
[Lihua Cai]的文章
[Yujue Zhou]的文章
[Yulong Ding]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。