题名 | DSP: Efficient GNN Training with Multiple GPUs |
作者 | |
通讯作者 | Yan, Xiao |
DOI | |
发表日期 | 2023-02-25
|
会议名称 | 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023
|
ISBN | 9798400700156
|
会议录名称 | |
页码 | 392-404
|
会议日期 | February 25, 2023 - March 1, 2023
|
会议地点 | Montreal, QC, Canada
|
会议录编者/会议主办者 | ACM SIGHPC; ACM SIGPLAN; HUAWEI
|
出版者 | |
摘要 | Jointly utilizing multiple GPUs to train graph neural networks (GNNs) is crucial for handling large graphs and achieving high efficiency. However, we find that existing systems suffer from high communication costs and low GPU utilization due to improper data layout and training procedures. Thus, we propose a system dubbed Distributed Sampling and Pipelining (DSP) for multi-GPU GNN training. DSP adopts a tailored data layout to utilize the fast NVLink connections among the GPUs, which stores the graph topology and popular node features in GPU memory. For efficient graph sampling with multiple GPUs, we introduce a collective sampling primitive (CSP), which pushes the sampling tasks to data to reduce communication. We also design a producer-consumer-based pipeline, which allows tasks from different mini-batches to run congruently to improve GPU utilization. We compare DSP with state-of-the-art GNN training frameworks, and the results show that DSP consistently outperforms the baselines under different datasets, GNN models and GPU counts. The speedup of DSP can be up to 26x and is over 2x in most cases. © 2023 ACM. |
学校署名 | 通讯
|
语种 | 英语
|
收录类别 | |
EI入藏号 | 20231013675700
|
EI主题词 | Deep learning
; Digital signal processing
; Graph neural networks
; Program processors
; Topology
|
EI分类号 | Ergonomics and Human Factors Engineering:461.4
; Semiconductor Devices and Integrated Circuits:714.2
; Computer Circuits:721.3
; Artificial Intelligence:723.4
; Combinatorial Mathematics, Includes Graph Theory, Set Theory:921.4
|
来源库 | EV Compendex
|
引用统计 |
被引频次[WOS]:0
|
成果类型 | 会议论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/519763 |
专题 | 工学院_计算机科学与工程系 |
作者单位 | 1.Department of Comptuer Sicence and Engineering, The Chinese University of Hong Kong, Hong Kong 2.Department of Computer Science and Engineering, Southern University of Science and Technology, China 3.Amazon Web Services |
通讯作者单位 | 计算机科学与工程系 |
推荐引用方式 GB/T 7714 |
Cai, Zhenkun,Zhou, Qihui,Yan, Xiao,et al. DSP: Efficient GNN Training with Multiple GPUs[C]//ACM SIGHPC; ACM SIGPLAN; HUAWEI:Association for Computing Machinery,2023:392-404.
|
条目包含的文件 | 条目无相关文件。 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论