中文版 | English
题名

A context-enhanced sentence representation learning method for close domains with topic modeling

作者
通讯作者Li, Shuangyin
发表日期
2022-08-01
DOI
发表期刊
ISSN
0020-0255
EISSN
1872-6291
卷号607页码:186-210
摘要
Sentence representation approaches have been widely used and proven to be effective in many text modeling tasks and downstream applications. Many recent proposals are avail-able on learning sentence representations based on deep neural frameworks. However, these methods are pre-trained in open domains and depend on the availability of large-scale data for model fitting. As a result, they may fail in some special scenarios, where data are sparse and embedding interpretations are required, such as legal, medical, or technical fields. In this paper, we present an unsupervised learning method to exploit representa-tions of sentences for some closed domains via topic modeling. We reformulate the infer-ence process of the sentences with the corresponding contextual sentences and the associated words, and propose an effective context-enhanced process called the bi-Directional Context-enhanced Sentence Representation Learning (bi-DCSR). This method takes advantage of the semantic distributions of the nearby contextual sentences and the associated words to form a context-enhanced sentence representation. To support the bi-DCSR, we develop a novel Bayesian topic model to embed sentences and words into the same latent interpretable topic space called the Hybrid Priors Topic Model (HPTM). Based on the defined topic space by the HPTM, the bi-DCSR method learns the embedding of a sentence by the two-directional contextual sentences and the words in it, which allows us to efficiently learn high-quality sentence representations in such closed domains. In addition to an open-domain dataset from Wikipedia, our method is validated using three closed-domain datasets from legal cases, electronic medical records, and technical reports. Our experiments indicate that the HPTM significantly outperforms on language modeling and topic coherence, compared with the existing topic models. Meanwhile, the bi-DCSR method does not only outperform the state-of-the-art unsupervised learning methods on closed domain sentence classification tasks, but also yields competitive performance com-pared to these established approaches on the open domain. Additionally, the visualizations of the semantics of sentences and words demonstrate the interpretable capacity of our model.(c) 2022 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
关键词
相关链接[来源记录]
收录类别
SCI ; EI
语种
英语
学校署名
其他
资助项目
National Natural Science Foundation of China[62006083] ; GuangZhou Basic and Applied Basic Research Foundation[202102020654] ; Applied Basic Research Fund of Guangdong Province[2019B1515120085]
WOS研究方向
Computer Science
WOS类目
Computer Science, Information Systems
WOS记录号
WOS:000817892200011
出版者
EI入藏号
20222412213143
EI主题词
Embeddings ; Medical computing ; Modeling languages ; Unsupervised learning
EI分类号
Biomedical Engineering:461.1 ; Artificial Intelligence:723.4 ; Computer Applications:723.5
ESI学科分类
COMPUTER SCIENCE
来源库
Web of Science
引用统计
被引频次[WOS]:2
成果类型期刊论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/355862
专题工学院_计算机科学与工程系
作者单位
1.South China Normal Univ, Sch Comp Sci, Guangzhou, Guangdong, Peoples R China
2.Sun Yat sen Univ, Sch Data & Comp Sci, Guangzhou, Guangdong, Peoples R China
3.Southern Univ Sci & Technol, Dept Comp Sci & Engn, Shenzhen, Guangdong, Peoples R China
推荐引用方式
GB/T 7714
Li, Shuangyin,Chen, Weiwei,Zhang, Yu,et al. A context-enhanced sentence representation learning method for close domains with topic modeling[J]. INFORMATION SCIENCES,2022,607:186-210.
APA
Li, Shuangyin.,Chen, Weiwei.,Zhang, Yu.,Zhao, Gansen.,Pan, Rong.,...&Tang, Yong.(2022).A context-enhanced sentence representation learning method for close domains with topic modeling.INFORMATION SCIENCES,607,186-210.
MLA
Li, Shuangyin,et al."A context-enhanced sentence representation learning method for close domains with topic modeling".INFORMATION SCIENCES 607(2022):186-210.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Li, Shuangyin]的文章
[Chen, Weiwei]的文章
[Zhang, Yu]的文章
百度学术
百度学术中相似的文章
[Li, Shuangyin]的文章
[Chen, Weiwei]的文章
[Zhang, Yu]的文章
必应学术
必应学术中相似的文章
[Li, Shuangyin]的文章
[Chen, Weiwei]的文章
[Zhang, Yu]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。