题名 | 基于费马数变换的卷积神经网络加速器设计与优化 |
其他题名 | DESIGN AND OPTIMIZATION OF CONVOLUTIONAL NEURAL NETWORK ACCELERATOR BASED ON FERMAT NUMBER TRANSFORM
|
姓名 | |
姓名拼音 | CHEN Bingzhen
|
学号 | 12132103
|
学位类型 | 硕士
|
学位专业 | 080902 电路与系统
|
学科门类/专业学位类别 | 08 工学
|
导师 | |
导师单位 | 纳米科学与应用研究院
|
论文答辩日期 | 2024-05-08
|
论文提交日期 | 2024-06-24
|
学位授予单位 | 南方科技大学
|
学位授予地点 | 深圳
|
摘要 | 随着人工智能技术的飞速进步,作为其关键分支的深度学习获得了广泛的关注。卷积神经网络作为深度学习中经典的算法,被广泛应用于智能驾驶和图像处理等领域。卷积作为卷积神经网络应用中计算的基础,在卷积神经网络中占据了大部分计算量,其运算速度的提升显得尤为重要。随着人工智能与物联网技术的深度融合,边缘计算的需求日益增长,对于设计高能效边缘计算芯片的需求愈发迫切。在此背景下,RISC-V开源处理器凭借其免费开源、设计简洁以及强大的可扩展性等特点,已成为设计边缘端计算芯片时的优选方案。 本文提出了一种面向边缘端计算的基于 RISC-V RI5CY 开源处理器的卷积神经网络加速器,称为RI5CY-FNT。该设计包含了一个基于费马数变换(FNT)的卷积加速单元、一个池化单元和一个激活单元,并针对加速模块设计了自定义指令。本文选取费马数变换算法作为卷积加速算法,费马数变换中的计算都是基于实数的,与基于复数的快速傅里叶变换(FFT)计算相比,显著降低了复杂性。与主流的 Winograd 算法相比,该算法在不额外消耗资源的情况下可以选取更多的卷积核尺寸,包括3x3和5x5大小的卷积核。此外,通过新的编码方式设计快速费马数变换的卷积加速器,将乘法和取模运算简化成位操作,并设计成专用的卷积加速器。然后针对加速器的结构设计了对应的RISC-V自定义指令,减少了指令开销,加速卷积计算过程。 本文以经典的卷积神经网络为测试用例,采用Pytorch 进行网络训练和参数量化,通过内嵌汇编的方式将自定义指令封装成C语言函数,并使用C语言搭建卷积神经网络。在虚拟验证平台 core-v-verif和 FPGA平台进行的实验表明,RISCY-FNT处理器指令功能正确,并正常进行 CNN的推理任务。RISCYFNT 处理器基于中芯国际 CMOS 55nm 工艺进行 DC 仿真,该处理器相比于原版 RI5CY 处理器,增加了约 12% 的面积和 23% 的功耗。基于 FPGA 平台的实验表明 RI5CY-FNT 处理器在LeNet-5 上执行推理任务时,与原版 RI5CY 处理器相比,能耗减少了约 40%。并且RI5CY-FNT 处理器相比于原处理器,在 LeNet-5 网络上执行推理任务使用3x3卷积核实现了 3.6x 的加速比,使用5x5 卷积核实现了 10.6x的加速比,在 VGG16网络上执行推理任务实现了 5.5x的加速比。 |
其他摘要 | With the rapid advancement of Al technology, deep leaming, as a key branch of it, has gamered widespread attention. Convolutional Neural Networks(CNNs), as classic algorithms in deep learning, are extensively used in fields like autonomous driving and image processing. Convolution as the computational foundation in CNN applications, occupies most of the computation in CNNs, making the improvement of its operation speed particularly important. With the deep integration of artificial intelligence and Internet of Things technology, the demand for edge computing is increasing, and the need to design high-performance edge computing chips has become more urgent. Against this backdrop, the RISC-V open-source processor, with its free and open-source nature, simple design, and powerful scalability, has become the preferred solution for designing edge computing chips. This article proposes a convolutional neural network accelerator based on the RISC-V RI5CY open-source processor for edge computing, named RI5CY-FNT. This design includes a convolution acceleration unit based on Fermat Number Transform (FNT), a pooling unit and an activation unit, and custom instructions are designed for the acceleration module. The Fermat Number Transform algorithm is chosen as the convolution acceleration algorithm in this paper. All calculations in the Fermat Number Transform are based on real numbers, which significantly reduces complexity compared to Fast FourierTransform (FFT) calculations based on complex numbers. Compared with the mainstream Winograd algorithm, this algorithm can select more convolution kernel sizes without additional resource consumption, including 3 x 3 and 5 x 5 convolution kernels. In addition, the convolution accelerator for fast Fermat number transform is designed with a new encoding method, which simplifies multiplication and modulo operations into bit operationsand is designed into a dedicated convolution accelerator. Then, corresponding RISC-V custom instructions are designed for the structure of the accelerator, reducing instruction overhead and accelerating the convolution computation process. This paper uses classic convolutional neural networks as test cases, uses Pytorch for network training and parameter quantization, encapsulates custom instructions into C language functions through embedded assembly, and uses C language to build convolutional neural networks. Experiments on the virtual verification platform core-v-verif and FPGA platform show that the Rl5CY-FNT processor instruction functions correctly and performs CNN inference tasks normally. The RI5CY-FNT processor is DC simulated based on the SMIC CMOS 55nm process. Compared with the original Rl5CY processor, it increases the area by about 12% and the power consumption by 23%. Experiments based on the FPGA platform show that when the RI5CY-FNT processor performs inference tasks on LeNet-5, it reduces energy consumption by about 40% compared to the original RI5CY processor. And the RISCY-FNT processor, compared to the original processor, achievesa speedup of 3.6x with a 3 x 3 convolution kernel and 10.6x with a 5 x 5 convolutionkernel when performing inference tasks on the LeNet-5 network, and achieves a speedupof 5.5x when performing inference tasks on the VGGl6 network. |
关键词 | |
其他关键词 | |
语种 | 中文
|
培养类别 | 独立培养
|
入学年份 | 2021
|
学位授予年份 | 2024-06
|
参考文献列表 | [1] JIANG X, HE K, CHEN Y. Automatic information extraction in the AI chip domain using gated interactive attention and probability matrix encoding method[J]. Expert Systems with Applications, 2023, 227: 120182. |
所在学位评定分委会 | 电子科学与技术
|
国内图书分类号 | TN47
|
来源库 | 人工提交
|
成果类型 | 学位论文 |
条目标识符 | http://sustech.caswiz.com/handle/2SGJ60CL/766012 |
专题 | 南方科技大学 工学院_电子与电气工程系 |
推荐引用方式 GB/T 7714 |
陈炳臻. 基于费马数变换的卷积神经网络加速器设计与优化[D]. 深圳. 南方科技大学,2024.
|
条目包含的文件 | ||||||
文件名称/大小 | 文献类型 | 版本类型 | 开放类型 | 使用许可 | 操作 | |
12132103-陈炳臻-电子与电气工程(4315KB) | 学位论文 | -- | 限制开放 | CC BY-NC-SA | 请求全文 |
个性服务 |
原文链接 |
推荐该条目 |
保存到收藏夹 |
查看访问统计 |
导出为Endnote文件 |
导出为Excel格式 |
导出为Csv格式 |
Altmetrics Score |
谷歌学术 |
谷歌学术中相似的文章 |
[陈炳臻]的文章 |
百度学术 |
百度学术中相似的文章 |
[陈炳臻]的文章 |
必应学术 |
必应学术中相似的文章 |
[陈炳臻]的文章 |
相关权益政策 |
暂无数据 |
收藏/分享 |
|
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。
修改评论