中文版 | English
题名

基于电荷域累加读出电路的多精度ReRAM存算加速器设计

其他题名
DESIGN OF MIXED-PRECISION RERAM COMPUTING-IN-MEMORY ACCELERATOR BASED ON CHARGE DOMAIN ACCUMULATION AND READOUT CIRCUIT
姓名
姓名拼音
LIU Jun
学号
12132458
学位类型
硕士
学位专业
0856 材料与化工
学科门类/专业学位类别
0856 材料与化工
导师
毛伟
导师单位
深港微电子学院
论文答辩日期
2023-05-18
论文提交日期
2023-06-28
学位授予单位
南方科技大学
学位授予地点
深圳
摘要
  近年来,卷积神经网络(CNN)在图像识别、目标检测和自动驾驶等 领域发挥出强大的功能,但是 CNN 的尺寸和结构也变得越来越复杂,传统 的基于冯诺依曼架构的硬件加速器由于存在“内存墙”和“功耗墙”的瓶颈,算力的提升速度已经跟不上神经网络(NN)模型演进的速度。
  为了加速 CNN 的推理过程,在软件侧,本文采用神经网络架构搜索 (NAS)算法来优化网络结构,压缩权重的数量和精度,将 CNN 量化为具有 2bit4bit 8bit 权重的多精度网络;在硬件侧,利用非易失性存储器 (NVMReRAM 的优势,本文提出了一种基于存算一体(CIM)架构的加速器,有望打破冯诺依曼瓶颈,CIM 架构将存储和计算融为一体,极大地较少了访存的功耗和延时。CNN 中近 99%的计算都为乘累加(MAC)计算,这需要存算加速器具有强大的并行计算能力,本文提出了一种可以隔离ReRAM 非线性阻值偏差的电荷域存算宏单元,从单元级改善了系统的吞吐量和能效。在传统方案中,累加读出电路包括数字移位累加模块和模数转换模块,占据了整个系统约 70%的功耗,本文对累加读出电路进行了优化,提出了一种基于电荷域计算的多精度累加器,可以实现对输入位和权重位的加权操作,同时支持并行读出;本文还提出了一种基于电荷域计算的读出电路,实现了模数转换的高能效输出,通过模数转换器(ADC)比较周期减少的方案,进一步降低读出电路中 ADC 的功耗。
  在 SMIC 28nm 的工艺下对各部分电路进行功能和性能仿真,仿真结果均与预期相符。基于存算宏单元构建的交叉阵列的最大行并行度可以达到1536。用于模拟后置加权的多精度累加器减少了 87.5%ADC 数量。ADC比较周期减少的方案实现了最高 84.3%ADC 功耗降低和 21.3%的累加读出电路功耗降低。本文采用基于 Pytorch 的模拟器 Memtorch 搭建网络验证环境,使用 NAS 优化的 ResNet-18 网络在数据集 CIFAR-10 上测试了所提
出的加速器的网络推理性能,Top-1 推理准确率可以达到93.46%。所设计的存算加速器实现了 180.61TOPS/W 的多精度平均能效,相比其他多精度存算加速器,能效提升了7.81 ~15.21 倍。
关键词
语种
中文
培养类别
独立培养
入学年份
2021
学位授予年份
2023-06
参考文献列表

[1] 中共中央国务院. 扩大内需战略规划纲要(2022-2035年)[EB/OL]. (2022-12-14)[2023-03-27]. http://www.gov.cn/xinwen/2022-12/14/content_5732067.htm.
[2] 科技部. 关于支持建设新一代人工智能示范应用场景的通知[EB/OL]. (2022-08-12)[2023-03-27]. https://www.most.gov.cn/xxgk/xinxifenlei/fdzdgknr/qtwj/qtwj2022/202208/ t20220815_181874.html.
[3] 中国信息通信研究院. 人工智能白皮书(2022年)[EB/OL]. (2022-04-15)
[2023-03-27]. https://www.xdyanbao.com/doc/d55bz8cnnb?bd_vid=8192290461634793982.
[4] XIU L. Time Moore: exploiting Moore's law from the perspective of time[M]. IEEE Solid-State Circuits Magazine, 2019, 11(1): 39-55.
[5] SHIN D, YOO H. The heterogeneous deep neural network processor with a non-von Neumann architecture[J]. Proceedings of the IEEE, 2020, 108(8): 1245-1260.
[6] YANG Z, PAN K, ZHOU N, et al. Scalable 2T2R logic computation structure: design from digital logic circuits to 3-D stacked memory arrays[J]. IEEE Journal on Exploratory Solid-State Computational Devices and Circuits, 2022, 8(2): 84-92.
[7] CHENG C, TIW P, CAI Y, et al. In-memory computing with emerging nonvolatile memory devices[J]. Science China Information Sciences, 2021, 64(221402): 1-46.
[8] 郭昕婕, 王绍迪. 端侧智能存算一体芯片概述[J]. 微纳电子与智能制造, 2019, 1(02): 72-82.
[9] CHEN Y H, YANG T J, EMER J, et al. Eyeriss v2: A flexible accelerator for emerging deep neural networks on mobile devices[J]. IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(2): 292-308.
[10] KYEONGRYEOL B, SUNGPILL C, CHANGHYEON K, et al. A 0.62mW ultra-low-power convolutional-neural-network face-recognition processor and a CIS integrated with always-on haar-like face detector[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2017: 344-346.
[11] TAIGMAN Y, YANG M, RANZATO M A, et al. DeepFace: closing the gap to human-level performance in face verification[C]// IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1701-1708.
[12] MOONS B, VERHELST M. An energy-efficient precision-scalable ConvNet processor in 40-nm CMOS[J]. IEEE Journal of Solid-State Circuits, 2017, 52(4): 903-914.
[13] SZE V. CHEN Y H, EMER J, et al. Hardware for machine learning: challenges and opportunities[C]// IEEE Custom Integrated Circuits Conference (CICC). 2017: 1-8.
[14] JOUPPI N P, YOUNG C, PATIL N. In-datacenter performance analysis of a tensor processing unit[C]// ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 2017: 1-12.
[15] SU J W, CHOU Y C, LIU R H, et al. A 28nm 384kb 6T-SRAM computation-in-memory macro with 8b precision for AI edge chips[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2021: 250-252.
[16] CHANG J, CHEN Y H, CHAN G, et al. A 5nm 135Mb SRAM in EUV and high-mobility-channel FinFET technology with metal coupling and charge-sharing write-assist circuitry schemes for high-density and low-VMIN applications[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2020: 238-240.
[17] SI X, CHEN J, TU Y, et al. A twin-8T SRAM computation-in-memory macro for multiple-bit CNN-based machine learning[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2019: 396-398.
[18] YIN S, JIANG Z, SEO J, et al. XNOR-SRAM: in-memory computing SRAM macro for binary/ternary deep neural networks[J]. IEEE Journal of Solid-State Circuits, 2020, 55(6): 1733-1743.
[19] ZHANG J, WANG Z, VERMA N. In-memory computation of a machine-learning classifier in a standard 6T SRAM array[J]. IEEE Journal of Solid-State Circuits, 2017, 52(4): 915-924.
[20] CHEN W, LI K, LIN W, et al. A 65nm 1Mb nonvolatile computing-in-memory ReRAM macro with sub-16ns multiply-and-accumulate for binary DNN AI edge processors[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2018: 494-496.
[21] LIU Q, GAO B, YAO P, et al. 33.2 A fully integrated analog ReRAM based 78.4TOPS/W compute-in-memory chip with fully parallel MAC computing[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2020: 500-502.
[22] WU T F, LI H, HUANG P, et al. Brain-inspired computing exploiting carbon nanotube FETs and resistive RAM: Hyperdimensional computing case study[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2018: 492-494.
[23] XUE C X, CHEN W, LIU J, et al. Embedded 1-Mb ReRAM-based computing-in-memory macro with multibit input and weight for CNN-based AI edge processors[J]. IEEE Journal of Solid-State Circuits, 2020, 55(1): 203-215.
[24] XUE C X, HUANG T Y, LIU J S, et al. 15.4 A 22nm 2Mb ReRAM compute-in-memory macro with 121-28TOPS/W for multibit MAC computing for tiny AI edge devices[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2020: 244-246.
[25] GALLO M L, SEBASTIAN A. An overview of phase-change memory device physics[J]. Journal of Physics D Applied Physics, 2020, 53(21).
[26] MITTAL S, VETTER J S, LI D. A survey of architectural approaches for managing embedded DRAM and non-volatile on-chip caches[J]. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(6): 1524-1537.
[27] GOLONZKA O, ALZATE J G, ARSLAN U, et al. MRAM as embedded non-volatile memory solution for 22FFL FinFET technology[C]// IEEE International Electron Devices Meeting (IEDM). 2018: 18.1.1-18.1.4.
[28] KAUTZ W H. Cellular logic-in-memory arrays[J]. IEEE Transactions on Computers, 1969, 18(8): 719-727.
[29] LI S, NIU D, MALLADI K T, et al. DRISA: a DRAM-based reconfigurable in-situ accelerator[C]// Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 2017: 288-301.
[30] SESHADRI V, MOWRY T C, LEE D, et al. Ambit: in-memory accelerator for bulk bitwise operations using commodity DRAM technology[C]// Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 2017: 273-287.
[31] AGRAWAL S R, AGARWAL N, SEDLAR E, et al. A many-core architecture for in-memory data processing[C]// Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 2017: 245-258.
[32] WU H, XIAO H, GAO B, et al. Resistive random access memory for future information processing system[J]. Proceedings of the IEEE, 2017, 105(9): 1770-1789.
[33] HUANG P, KANG J, ZHAO Y, et al. Reconfigurable nonvolatile logic operations in resistance switching crossbar array for large-scale circuits[J]. Advanced Materials, 2016, 28(44): 9758-9764.
[34] ZHOU Y, YI L, DUAN N, et al. Boolean and sequential logic in a one-memristor-one-resistor(1M1R) structure for in-memory computing[J]. Advanced Electronic Materials, 2018, 4(9): 1-9.
[35] SHAFIEE A, NAG A, MURALIMANOHAR N, et al. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars[C]// ACM/IEEE International Symposium on Computer Architecture. 2016: 14-26.
[36] HU M, STRACHAN J P, LI Z Y, et al. Dot-product engine for neuromorphic computing: programming 1T1M crossbar to accelerate matrix-vector multiplication[C]// ACM/EDAC/IEEE Design Automation Conference (DAC). 2016: 1-6.
[37] XUE C X , CHEN W H, LIU J S, et al. A 1Mb multibit ReRAM computing-in-memory macro with 14.6ns parallel MAC computing time for CNN based AI edge processors[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2019: 388-390.
[38] YOON J H, CHANG M, KHWA W S, et al. A 40-nm, 64-Kb, 56.67 TOPS/W voltage-sensing computing-in-memory/digital RRAM macro supporting iterative write with verification and online read-disturb detection[J]. IEEE Journal of Solid-State Circuits, 2021, 57(1): 68-79.
[39] CHEN P Y, YU S. Compact modeling of RRAM devices and its applications in 1T1R and 1S1R array design[J]. IEEE Transactions on Electron Devices, 2015, 62(12): 4022-4028.
[40] YU R C, LI A, CHEN C F, et al. NISP: pruning networks using neuron importance score propagation[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 9194-9203.
[41] JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 2704-2713.
[42] ZHUANG B H, SHEN C H, TAN M K, et al. Towards effective low-bitwidth convolutional neural networks[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018: 7920-7928.
[43] VERMA N, JIA H Y, VALAVI H, et al. In-memory computing: advances and prospects[J]. IEEE Solid-State Circuits Magazine, 2019, 11(3): 43-55.
[44] JIA H Y, VALAVI H, TANG Y Q, et al. A programmable heterogeneous microprocessor based on bit-scalable in-memory computing[J]. IEEE Journal of Solid-State Circuits, 2020, 55(9): 2609-2621.
[45] OMRAN H, ALAHMADI H, SALAMA K. Matching properties of femtofarad and sub-femtofarad MOM capacitors[J]. IEEE Trans. Circuits Syst. I, 2016, 63(6): 763-772.
[46] ZHANG S, HUANG K J, SHEN H B. A robust 8-bit non-volatile computing-in-memory core for low-power parallel MAC operations[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2020, 67(6): 1867-1880.
[47] XUE C X, HUNG J M, KAO H Y, et al. 16.1 A 22nm 4Mb 8b-precision ReRAM computing-in-memory macro with 11.91 to 195.7TOPS/W for tiny AI edge devices[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2021: 245-247.
[48] ZHENG Q L, WANG Z W, FENG Z S, et al. Lattice: An ADC/DAC-less ReRAM-based processing-in-memory architecture for accelerating deep convolution neural networks[C]// ACM/IEEE Design Automation Conference (DAC). 2020: 1-6.
[49] KHWA W S, CHIU Y C, JHANG C J, et al. A 40-nm, 2M-cell, 8b-precision, hybrid SLC-MLC PCM computing-in-memory macro with 20.5 - 65.0TOPS/W for tiny-Al edge devices[C]// IEEE International Solid-State Circuits Conference (ISSCC). 2022: 1-3.

所在学位评定分委会
材料与化工
国内图书分类号
TN492
来源库
人工提交
成果类型学位论文
条目标识符http://sustech.caswiz.com/handle/2SGJ60CL/544463
专题南方科技大学-香港科技大学深港微电子学院筹建办公室
推荐引用方式
GB/T 7714
刘俊. 基于电荷域累加读出电路的多精度ReRAM存算加速器设计[D]. 深圳. 南方科技大学,2023.
条目包含的文件
文件名称/大小 文献类型 版本类型 开放类型 使用许可 操作
12132458-刘俊-南方科技大学-香(13267KB)----限制开放--请求全文
个性服务
原文链接
推荐该条目
保存到收藏夹
查看访问统计
导出为Endnote文件
导出为Excel格式
导出为Csv格式
Altmetrics Score
谷歌学术
谷歌学术中相似的文章
[刘俊]的文章
百度学术
百度学术中相似的文章
[刘俊]的文章
必应学术
必应学术中相似的文章
[刘俊]的文章
相关权益政策
暂无数据
收藏/分享
所有评论 (0)
[发表评论/异议/意见]
暂无评论

除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。