中文版 | English
题名

Cluster-Reduce: Compressing Sketches for Distributed Data Streams

作者
通讯作者Yang,Tong
DOI
发表日期
2021-08-14
会议录名称
页码
2316-2326
摘要
Sketches, a type of probabilistic algorithms, have been widely accepted as the approximate summary of data streams. Compressing sketches is the best choice in distributed data streams to reduce communication overhead. The ideal compression algorithm should meet the following three requirements: high efficiency of compression procedure, support of direct query without decompression, and high accuracy of compressed sketches. However, no prior work can meet these requirements at the same time. Especially, the accuracy is poor after compression using existing methods. In this paper, we propose Cluster-Reduce, a framework for compressing sketches, which can meet all three requirements. Our key technique nearness clustering rearranges the adjacent counters with similar values in the sketch to significantly improve the accuracy. We use Cluster-Reduce to compress four kinds of sketches in two use-cases: distributed data streams and distributed machine learning. Extensive experimental results show that Cluster-Reduce can achieve up to 60 times smaller error than prior works. The source codes of Cluster-Reduce are available at Github anonymously[1].
关键词
学校署名
其他
语种
英语
相关链接[Scopus记录]
收录类别
EI入藏号
20213810905446
EI主题词
Data mining ; Data streams
EI分类号
Data Processing and Image Processing:723.2
Scopus记录号
2-s2.0-85114933154
来源库
Scopus
引用统计
被引频次[WOS]:4
成果类型会议论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/245948
专题南方科技大学
工学院_计算机科学与工程系
未来网络研究院
作者单位
1.Department of Computer Science and Technology,National Engineering Laboratory for Big Data Analysis Technology and Application,Peking University,China
2.Peng Cheng Laboratory,Shenzhen,China
3.School of Software and Microelectrionics,Peking University,China
4.Huawei Theory Lab,Shenzhen,China
5.Southern University of Science and Technology,Shenzhen,China
推荐引用方式
GB/T 7714
Zhao,Yikai,Zhong,Zheng,Li,Yuanpeng,et al. Cluster-Reduce: Compressing Sketches for Distributed Data Streams[C],2021:2316-2326.
条目包含的文件
条目无相关文件。
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[Zhao,Yikai]的文章
[Zhong,Zheng]的文章
[Li,Yuanpeng]的文章
百度学术
百度学术中相似的文章
[Zhao,Yikai]的文章
[Zhong,Zheng]的文章
[Li,Yuanpeng]的文章
必应学术
必应学术中相似的文章
[Zhao,Yikai]的文章
[Zhong,Zheng]的文章
[Li,Yuanpeng]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。