题名 | Enhancing Code Representation Learning for Code Search with Abstract Code Semantics |
作者 | |
DOI | |
发表日期 | 2024-07-05
|
ISSN | 2161-4393
|
ISBN | 979-8-3503-5932-9
|
会议录名称 | |
会议日期 | 30 June-5 July 2024
|
会议地点 | Yokohama, Japan
|
摘要 | Code representation learning is an important way to encode the semantics of source code through pre-training. The learned representation supports a variety of downstream tasks, such as natural language code search and code defect detection. Inspired by pre-trained models for natural language representation learning, existing approaches often treat the source code or its structural information (e.g., Abstract Syntax Tree or AST) as a plain token sequence. Unlike natural language, programming language has its unique code unit information (e.g., identifiers and expressions) and logic information (e.g., the functionality of a code snippet). To further explore those properties, we propose Abstract Code Embedding (AbCE), a self-supervised learning method that considers the abstract semantics of code logic. Instead of scattered tokens, AbCE treats an entire node or a subtree in an AST as a basic code unit during pre-training, which preserves the entirety of a coding unit. Moreover, AbCE learns the abstract semantics of AST nodes via a self-distillation way. Experimental results show that it achieves significant improvements over state-of-the-art baselines on code search tasks and comparable performance on code clone detection and defect detection tasks even without using contrastive learning or curriculum learning. |
学校署名 | 第一
|
相关链接 | [IEEE记录] |
引用统计 | |
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/828704 |
专题 | 工学院_计算机科学与工程系 |
作者单位 | 1.Department of Computer Science and Engineering, Southern University of Science and Technology, Shenzhen, China 2.Peng Cheng Laboratory, Shenzhen, China 3.Distributed and Parallel Software Lab, Huawei, Shenzhen, China |
第一作者单位 | 计算机科学与工程系 |
第一作者的第一单位 | 计算机科学与工程系 |
推荐引用方式 GB/T 7714 |
Shaojie Zhang,Yiwei Ding,Enrui Hu,et al. Enhancing Code Representation Learning for Code Search with Abstract Code Semantics[C],2024.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论