题名 | DGCL: An efficient communication library for distributed GNN training |
作者 | |
通讯作者 | Yan,Xiao |
DOI | |
发表日期 | 2021-04-21
|
会议名称 | 16th European Conference on Computer Systems (EuroSys)
|
会议录名称 | |
页码 | 130-144
|
会议日期 | APR 26-28, 2021
|
会议地点 | null,null,ELECTR NETWORK
|
出版地 | 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
|
出版者 | |
摘要 | Graph neural networks (GNNs) have gained increasing popularity in many areas such as e-commerce, social networks and bio-informatics. Distributed GNN training is essential for handling large graphs and reducing the execution time. However, for distributed GNN training, a peer-to-peer communication strategy suffers from high communication overheads. Also, different GPUs require different remote vertex embeddings, which leads to an irregular communication pattern and renders existing communication planning solutions unsuitable. We propose the distributed graph communication library (DGCL) for efficient GNN training on multiple GPUs. At the heart of DGCL is a communication planning algorithm tailored for GNN training, which jointly considers fully utilizing fast links, fusing communication, avoiding contention and balancing loads on different links. DGCL can be easily adopted to extend existing single-GPU GNN systems to distributed training. We conducted extensive experiments on different datasets and network configurations to compare DGCL with alternative communication schemes. In our experiments, DGCL reduces the communication time of the peer-to-peer communication by 77.5% on average and the training time for an epoch by up to 47%. |
关键词 | |
学校署名 | 通讯
|
语种 | 英语
|
相关链接 | [Scopus记录] |
收录类别 | |
资助项目 | RGC of HKSAR[GRF 14208318]
|
WOS研究方向 | Computer Science
|
WOS类目 | Computer Science, Hardware & Architecture
; Computer Science, Information Systems
; Computer Science, Theory & Methods
|
WOS记录号 | WOS:000744467200009
|
EI入藏号 | 20211910317392
|
EI主题词 | Program processors
|
EI分类号 | Management:912.2
|
Scopus记录号 | 2-s2.0-85105275786
|
来源库 | Scopus
|
引用统计 |
被引频次[WOS]:62
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/228468 |
专题 | 南方科技大学 工学院_计算机科学与工程系 |
作者单位 | 1.The Chinese University of Hong Kong,Hong Kong 2.Southern University of Science and Technology,China 3.Huawei Technologies Co. Ltd, |
通讯作者单位 | 南方科技大学 |
推荐引用方式 GB/T 7714 |
Cai,Zhenkun,Yan,Xiao,Wu,Yidi,et al. DGCL: An efficient communication library for distributed GNN training[C]. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES:ASSOC COMPUTING MACHINERY,2021:130-144.
|
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | 操作 | |
DGCL.pdf(889KB) | -- | -- | 限制开放 | -- |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论