中文版 | English
题名

PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters

作者
通讯作者Yan,Xiao
发表日期
2024-06-01
DOI
发表期刊
ISSN
0167-8191
卷号120
摘要
Multi-tenant GPU clusters are common, where users purchase GPU quota to run their neural network training jobs. However, strict quota-based scheduling often leads to cluster under-utilization, while allowing quota groups to use excess GPUs improves utilization but results in fairness problems. We propose PPS, a probabilistic prediction based scheduler, which uses job history statistics to predict future cluster status for making good scheduling decisions. Different from existing schedulers that rely on deep learning frameworks to adjust bad scheduling decisions and/or require detailed job information, PPS treats jobs as black boxes in that PPS runs a job to completion without adjustment once scheduled and requires only aggregate job statistics. The black-box feature is favorable due to its good generality, compatibility and security, and made possible by the predictability of aggregate resource utilization statistics of large clusters. Extensive experiments show that PPS achieves high cluster utilization and good fairness simultaneously.
关键词
相关链接[Scopus记录]
收录类别
SCI ; EI
语种
英语
学校署名
通讯
ESI学科分类
COMPUTER SCIENCE
Scopus记录号
2-s2.0-85189175736
来源库
Scopus
引用统计
成果类型期刊论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/741113
专题工学院_计算机科学与工程系
作者单位
1.Department of Computer Science and Engineering,The Chinese University of Hong Kong,Hong Kong,Hong Kong
2.Department of Computer Science and Engineering,Southern University of Science and Technology,Shen Zhen,China
3.Alibaba Group,Beijing,China
通讯作者单位计算机科学与工程系
推荐引用方式
GB/T 7714
Ma,Kaihao,Cai,Zhenkun,Yan,Xiao,et al. PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters[J]. Parallel Computing,2024,120.
APA
Ma,Kaihao.,Cai,Zhenkun.,Yan,Xiao.,Zhang,Yang.,Liu,Zhi.,...&Cheng,James.(2024).PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters.Parallel Computing,120.
MLA
Ma,Kaihao,et al."PPS: Fair and efficient black-box scheduling for multi-tenant GPU clusters".Parallel Computing 120(2024).
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Ma,Kaihao]的文章
[Cai,Zhenkun]的文章
[Yan,Xiao]的文章
百度学术
百度学术中相似的文章
[Ma,Kaihao]的文章
[Cai,Zhenkun]的文章
[Yan,Xiao]的文章
必应学术
必应学术中相似的文章
[Ma,Kaihao]的文章
[Cai,Zhenkun]的文章
[Yan,Xiao]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。