题名 | 一种基于NAS优化的多精度稀疏神经网络加速器 |
其他题名 | A Sparse and Mixed-Precision Accelerator for NAS Optimized Convolutional Neural Networks
|
姓名 | |
姓名拼音 | LIU Yucen
|
学号 | 12132461
|
学位类型 | 硕士
|
学位专业 | 0856 材料与化工
|
学科门类/专业学位类别 | 0856 材料与化工
|
导师 | |
导师单位 | 深港微电子学院
|
论文答辩日期 | 2023-05-18
|
论文提交日期 | 2023-06-21
|
学位授予单位 | 南方科技大学
|
学位授予地点 | 深圳
|
摘要 | 当今,神经网络在图像识别,自然语言处理,语音识别,行为预测等领域的应用越来越广泛。我们越来越需要在边沿的设备上部署神经网络。模型压缩让在边沿设备部署神经网络推理任务成为可能,但网络压缩带来的数据计算模式的变化让现有的加速器的计算模式不能很好的利用网络压缩的成果以提高推理的效率。另外,神经网络搜索(NAS)算法可以对压缩的网络模型进行优化,为各个网络层配置合适的压缩方案,使得网络的推理准确率只有微小下降的同时,大大缩减整个网络的计算与存储容量。 目前的加速器设计都不能很好地适应NAS算法给出的同时具有多种精度与多种稀疏度配置的复杂神经网络。为了满足这类网络压缩需求,本文给出了一种多精度稀疏神经网络加速器(SMP)系统,该加速器有如下特点。首先,该加速器能够支持1/2/4/8-bit四种数据精度模式,支持50%、75%、87.5%等多种稀疏度的权重结构化稀疏计算,且提出了低精度下稀疏数据的高效压缩方式,减少了稀疏地址的开销;其次,本加速器还采用了创新的矢量脉动计算阵列,相比传统的并行计算阵列具有更好的时序,相比原子脉动阵列具有更低的延迟;另外,为了适应多种精度与多种稀疏度的片上存储,减少SRAM容量的浪费,本文提出了SRAM混合拼接方案,对比不使用SRAM混合拼接的存储策略,平均计算通量提升到了3.33倍;最后,一般的ASIC加速器结构固化,模块耦合性高,而本加速器有着很好的可扩展性,得益于内部的控制单元与控制总线协议,模块间解耦合,加速器能够方便地集成其他算子单元以支持更多类型的网络计算需求。 本文使用Synopsys的EDA工具在28nm工艺下对加速器设计实现进行了前端的功能与功耗仿真。并使用Top-1准确率为67.7%的混合精度与稀疏度的NAS-VGG16网络对加速器进行网表仿真测试,其中4bit层的峰值能效为10.89TOPS/W,87.5%稀疏度8bit层的峰值能效为23.90TOPS/W,全层网表仿真平均能效为15TOPS/W。在保持通用性与可扩展性的同时,对比其他稀疏加速器实现了1.07-3.89倍的能效比提升。 |
其他摘要 | Nowadays, neural networks are being used more and more widely in fields such as image recognition, natural language processing, speech recognition, and behavior prediction. We increasingly need to deploy neural networks on edge devices. Model compression makes it possible to deploy neural network inference tasks on edge devices, but the change in data computation patterns brought about by network compression makes it difficult for existing accelerators to effectively utilize the results of network compression to improve inference efficiency. In addition, neural network search (NAS) algorithms can optimize compressed network models, configuring suitable compression schemes for each network layer, greatly reducing the calculation and storage capacity of the entire network while only slightly reducing the inference accuracy of the network. Current accelerator designs cannot adapt well to complex neural networks configured with multiple accuracies and multiple sparsities given by NAS algorithms. In order to meet these network compression requirements and efficiently adapt to network layers with multiple different configurations, this paper proposes a sparse and multi-precision neural network accelerator (SMP) system, which has the following characteristics. Firstly, the accelerator can support four data precision modes of 1/2/4/8-bit and supports calculations of multiple weight structured sparse sparsities such as 50%, 75%, and 87.5%. Secondly, the accelerator adopts an innovative vector systolic computing array, which has better timing than traditional parallel computing arrays and lower latency than atomic systolic arrays. In addition, to adapt to storage of multiple precisions and multiple sparsities and reduce waste of SRAM capacity, we propose an SRAM hybrid splicing scheme, which improve the computing throughput to 3.33x. Finally, the general ASIC accelerator structure is fixed and has high module coupling, while this accelerator has good scalability, thanks to its internal control unit and control bus protocol, which decouples the modules. The accelerator can easily integrate other operator units to support more types of network computing needs. This paper uses Synopsys' EDA tools to perform front-end functional and power simulations of the accelerator design implementation in 28nm process. The accelerator is then subjected to netlist simulation testing using NAS-VGG16 network with a mixed precision and sparsity top-1 accuracy of 67.7%. The peak energy efficiency of the 4-bit layer is 10.89TOPS/W, and that of the 87.5% sparse 8-bit layer is 23.90TOPS/W. The average energy efficiency of the entire netlist simulation is 15TOPS/W. Compared with other sparse accelerators, it achieved an energy efficiency improvement of 1.07-3.89 times. |
关键词 | |
其他关键词 | |
语种 | 中文
|
培养类别 | 独立培养
|
入学年份 | 2021
|
学位授予年份 | 2023-06
|
参考文献列表 | [1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet Classification With Deep Convolutional Neural Networks[J]. Advances in Neural Information Processing Systems, 2012, 25: 1106-1114. |
所在学位评定分委会 | 材料与化工
|
国内图书分类号 | TN492
|
来源库 | 人工提交
|
成果类型 | 学位论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/543926 |
专题 | 南方科技大学-香港科技大学深港微电子学院筹建办公室 |
推荐引用方式 GB/T 7714 |
刘禹岑. 一种基于NAS优化的多精度稀疏神经网络加速器[D]. 深圳. 南方科技大学,2023.
|
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | 操作 | |
12132461-刘禹岑-南方科技大学-(5108KB) | -- | -- | 限制开放 | -- | 请求全文 |
个性服务 |
原文链接 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
导出为Excel格式 |
导出为Csv格式 |
Altmetrics Score |
谷歌学术 |
谷歌学术中相似的文章 |
[刘禹岑]的文章 |
百度学术 |
百度学术中相似的文章 |
[刘禹岑]的文章 |
必应学术 |
必应学术中相似的文章 |
[刘禹岑]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论