中文版 | English
题名

DGCL: An efficient communication library for distributed GNN training

作者
通讯作者Yan,Xiao
DOI
发表日期
2021-04-21
会议名称
16th European Conference on Computer Systems (EuroSys)
会议录名称
页码
130-144
会议日期
APR 26-28, 2021
会议地点
null,null,ELECTR NETWORK
出版地
1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
出版者
摘要

Graph neural networks (GNNs) have gained increasing popularity in many areas such as e-commerce, social networks and bio-informatics. Distributed GNN training is essential for handling large graphs and reducing the execution time. However, for distributed GNN training, a peer-to-peer communication strategy suffers from high communication overheads. Also, different GPUs require different remote vertex embeddings, which leads to an irregular communication pattern and renders existing communication planning solutions unsuitable. We propose the distributed graph communication library (DGCL) for efficient GNN training on multiple GPUs. At the heart of DGCL is a communication planning algorithm tailored for GNN training, which jointly considers fully utilizing fast links, fusing communication, avoiding contention and balancing loads on different links. DGCL can be easily adopted to extend existing single-GPU GNN systems to distributed training. We conducted extensive experiments on different datasets and network configurations to compare DGCL with alternative communication schemes. In our experiments, DGCL reduces the communication time of the peer-to-peer communication by 77.5% on average and the training time for an epoch by up to 47%.

关键词
学校署名
通讯
语种
英语
相关链接[Scopus记录]
收录类别
资助项目
RGC of HKSAR[GRF 14208318]
WOS研究方向
Computer Science
WOS类目
Computer Science, Hardware & Architecture ; Computer Science, Information Systems ; Computer Science, Theory & Methods
WOS记录号
WOS:000744467200009
EI入藏号
20211910317392
EI主题词
Program processors
EI分类号
Management:912.2
Scopus记录号
2-s2.0-85105275786
来源库
Scopus
引用统计
被引频次[WOS]:62
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/228468
专题南方科技大学
工学院_计算机科学与工程系
作者单位
1.The Chinese University of Hong Kong,Hong Kong
2.Southern University of Science and Technology,China
3.Huawei Technologies Co. Ltd,
通讯作者单位南方科技大学
推荐引用方式
GB/T 7714
Cai,Zhenkun,Yan,Xiao,Wu,Yidi,et al. DGCL: An efficient communication library for distributed GNN training[C]. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES:ASSOC COMPUTING MACHINERY,2021:130-144.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
DGCL.pdf(889KB)----限制开放--
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Cai,Zhenkun]的文章
[Yan,Xiao]的文章
[Wu,Yidi]的文章
百度学术
百度学术中相似的文章
[Cai,Zhenkun]的文章
[Yan,Xiao]的文章
[Wu,Yidi]的文章
必应学术
必应学术中相似的文章
[Cai,Zhenkun]的文章
[Yan,Xiao]的文章
[Wu,Yidi]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。