[1] OTTER D W, MEDINA J R, KALITA J K. A Survey of the Usages of Deep Learning for Natural Language Processing[J]. Ieee Transactions on Neural Networks and Learning Systems, 2021, 32(2): 604 -624.
[2] GYSEL P, PIMENTEL J, MOTAMEDI M, et al. Ristretto: A Framework for Empirical Study of Resource -Efficient Inference in Convolutional Neural Networks[J]. Ieee Transactions on Neu ral Networks and Learning Systems, 2018: 5784-5789.
[3] SANTOS P D, ALVES J C, FERREIRA J C. An FPGA Array for Cellular Genetic Algorithms: Application to the Minimum Energy Broadcast Problem[J]. Microprocessors and Microsystems, 2018, 58(APR.): 1 -12.
[4] CAI H, WANG T, WU Z, et al. On -Device Image Classification with Proxyless Neural Architecture Search and Quantization -Aware Fine -Tuning; proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), F, 2019 [C]. 2019.
[5] WANG K, LIU Z, LIN Y, et al. HAQ: Hardware -Aware Automated Quantization With Mixed Precision; proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), F, 2020 [C].2020.
[6] TECHNICOLOR T, RELATED S, TECHNICOLOR T, et a l. ImageNet Classification with Deep Convolutional Neural Networks[J]. Advances in neural information processing systems, 2012 .
[7] SIMONYAN K, ZISSERMAN A. Very Deep Convolutional Networks for Large-Scale Image Recognition[J]. Computer Science, 2014 .
[8] SZEGEDY C, LIU W, JIA Y, et al. Going Deeper with Convolutions[J]. IEEE Conference on Computer Vision and Pattern Recognition , 2015.
[9] IOFFE S, SZEGEDY C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift; proce edings of the 32nd International Conference on Machine Learning, Lille, FRANCE, F Jul 07 -09, 2015 [C]. 2015.
[10] HE K, ZHANG X, REN S, et al. Deep Residual Learning for Image Recognition[J]. IEEE Conference on Computer Vision and Pattern Recognition, 2016.
[11] HUANG G, LIU Z, LAURENS V, et al. Densely Connected Convolutional Networks[J]. IEEE Conference on Computer Vision and Pattern Recognition ,2017.
[12] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-Level Accuracy with 50x Fewer Parameters and <0.5MB Model Size[J]. arXiv preprint arXiv:160207360, 2016.
[13] HOWARD A G, ZHU M, CHEN B, et al. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications[J]. arXiv preprint arXiv:170404861, 2017.
[14] ZHANG X, ZHOU X, LIN M, et al. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices[J]. arXiv preprint arXiv:170701083, 2017.
[15] BARRET Z, Q L. Neural Architecture Search with Reinforcement Learning[J]. arXiv preprint arXiv:16 1101578, 2016.
[16] MILLER G. Designing neural networks using Genetic Algorithms; proceedings of the the 3rd International Conference on Genetic Algorithms, F, 1989 [C]. 1989.
[17] NICKOLLS J R, BUCK I, GARLAND M, et al. Scalable parallel programming with CUDA[J]. IEEE Hot Chips 20 Symposium, 2008 .
[18] JOUPPI N P, YOUNG C, PATIL N, et al. In -Datacenter Performance Analysis of a Tensor Processing Unit; proceedings of the 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, CANADA, F Jun 24-28, 2017 [C]. 2017.
[19] CHEN Y H, EMER J, SZE V, et al. Eyeriss: A Spatial Architecture for Energy -Efficient Dataflow for Convolutional Neural Networks; proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer A rchitecture (ISCA), Seoul, SOUTH KOREA, F Jun 18 -22, 2016 [C]. 2016.
[20] CHEN Y H, KRISHNA T, EMER J S, et al. Eyeriss: An Energy -Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks[J]. Ieee Journal of Solid-State Circuits, 2017, 52(1): 127-138.
[21] CHEN T S, DU Z D, SUN N H, et al. DianNao: A Small-Footprint High Throughput Accelerator for Ubiquitous Machine -Learning[J]. Acm Sigplan Notices, 2014, 49(4): 269-283.
[22] LUO T, LIU S L, LI L, et al. DaDianNao: A Neural Network Superc omputer[J]. Ieee Transactions on Computers, 2017, 66(1): 73 -88.
[23] DU Z D, FASTHUBER R, CHEN T S, et al. ShiDianNao: Shifting Vision Processing Closer to the Sensor; proceedings of the ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA), Portland, OR, F Jun 13-17, 2015 [C]. 2015.
[24] LIU D F, CHEN T S, LIU S L, et al. PuDianNao: A Polyvalent Machine Learning Accelerator[J]. Acm Sigplan Notices, 2015, 50(4): 369 -381.
[25] LIU S L, DU Z D, TAO J H, et al. Cambricon: An Instruction Set Architecture for Neural Networks; proceedings of the 43rd ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), Seoul, SOUTH KOREA, F Jun 18-22, 2016 [C]. 2016.
[26] MO H, ZHU W, HU W, et al. 9.2A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective -Weight-Based Convolution and Error Compensation-Based Prediction; proceedings of the 2021 IEEE International Solid- State Circuits Conference, F, 2021 [C]. 2021.
[27] PEI J, DENG L, SONG S, et al. Towards artificial general intelligence with hybrid Tianjic chip architecture[J]. Nature, 2019, 572(7767): 106 -110.
[28] BIRADAR V B, VISHWAS P G, CHETAN C S, et al. Design and Performance Analysis of modified Unsigned Braun and Signed Baugh -Wooley Multiplier; proceedings of the International Conference on Electrical, Electronics, Communication, Computer, and Optimization Techniques, Mysuru, India, F Dec 15-16, 2017 [C]. 2017.
[29] SWEE K L S, HIUNG L H, IEEE. Performance Comparison Review of Radix -Based Multiplier Designs; proceedings of the 4t h International Conference on Intelligent and Advanced Systems and A Conference of World Engineering, Science and Technology Congress, Kuala Lumpur, MALAYSIA, F Jun 12 -14, 2012 [C]. 2012.
[30] YKUNTAM Y D, PAVANI K, SALADI K. Design and Analysis of High Speed Wallace Tree Multiplier Using Parallel Prefix Adders for VLSI Circuit Designs; proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), F, 2020 [C]. 2020.
[31] RIAZ M H, AHMED S A, JAVAID Q, et al. Low Power 4x4 Bit Multiplier Design using Dadda Algorithm and Optimized Full Adder; proceedings of the 15th International Bhurban Conference on Applied Sciences and Technology, Natl Ctr Phys, Islamabad, PAKISTAN, F Jan 09 -13, 2018 [C]. 2018.
[32] PARK J, KIM Y, IEEE. Design and Implementation of Ternary Carry Lookahead Adder on FPGA; proceedings of the 20th International Conference on Electronics, Information, and Communication (ICEIC), South Korea, F Jan 31-Feb 03, 2021 [C]. 2021.
[33] KIM T, JAO, W. Circuit Optimization Using Carry-Save-Adder Cells[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 1998.
[34] PARMAR S, SINGH K P. Design of High Speed Hybrid Carry Select Adder; proceedings of the Advance Computing Conference (IACC), 2013 IEEE 3rd International, F, 2013 [C]. 2013.
[35] REN P Z, XIAO Y, CHANG X J, et al. A Comprehensive Survey of NeuralArchitecture Search: Challenges and Solutions[J]. Acm Computing Surveys, 2021, 54(4).
[36] PARASHAR A, RHU M, MUKKARA A, et al. SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks; proceedings of the the 44th Annual International Symposium, F, 2017 [C]. 2017.
[37] LIU Z G, WHATMOUGH P, MATTINA M. Sparse Systolic Tensor Array for Efficient CNN Hardware Acceleration[J]. arXiv preprint arXiv:200902381, 2020.
[38] UJWALA D, MATHAN N. Review on Performance of Multipliers[J]. Research Journal of Pharmaceutical Biological and Chemical Sciences, 2017, 8(2): 2668-2672.
[39] HARIKA K, SWETHA B V, RENUKA B, et al. Analysis of Different Multiplication Algorithms & FPGA Implementation[J]. IOSR Journal of VLSI and Signal processing, 2014, 4(2): 29 -35.
[40] PRAJWAL N, AMARESHA S K, YELLAMPALLI S S. Low Power ASIC Implementation of Signed and Unsigned Wallace-Tree with Vedic Multiplier Using Compressors; proceedings of the International Conference On Smart Technologies For Smart Nation (SmartTechCon), REVA Univ, Bengaluru, INDIA, F Aug 17-19, 2017 [C]. 2017.
[41] KUMM M, GUSTAFSSON O, DE DINECHIN F, et al. Karatsuba with Rectangular Multipliers for FPGAs; proceedings of the 25th International Symposium on Computer Arithmetic, Amherst, MA, F Jun 25 -27, 2018 [C]. 2018.
[42] DAI L, CHENG Q, WANG Y, et al. An Energy-Efficient Bit-Split-and Combination Systolic Accelerator for NAS-Based Multi-Precision Convolution Neural Networks; proceeding of the 27th Asia and South Pacific Design Automation Conference (ASP-DAC), 2022 [C]. 2022.
[43] LI K, ZHOU J, WANG Y, et al. A Precision-Scalable Energy-Efficient Bit Split-and-Combination Vector Systolic Accelerator for NAS -Optimized DNNs on Edge; proceeding of Design Automation and Test in Europe Conference (DATE), 2022 [C].
[44] SHARMA H, PARK J, SUDA N, et al. Bit Fusion: BitLevel Dynamically Composable Architecture for Accelerating Deep Neural Networks; proceedings of the 45th ACM/IEEE Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, F Jun 01 -06, 2018 [C]. 2018.
[45] CAMUSY V, MEIY L, ENZ C, et al. Review and Benchmarking of Precision -Scalable Multiply-Accumulate Unit Architectures for Embedded Neural Network Processing[J]. Ieee Journal on Emerging and Selected Topics in Circuits and Systems, 2019, 9(4): 697 -711.
[46] JO J, KIM S, PARK I C. Energy-Efficient Convolution Architecture Based on Rescheduled Dataflow[J]. IEEE Transactions on Circuits & Systems I Regular Papers, 2018, 65(12): 4196-4207.
[47] SOHN J, SWARTZLANDER E. A Fused Floating -Point Three-Term Adder[J]. Circuits and Systems I: Regular Papers, IEEE Transactions on, 2014, 61(10): 2842-2850.
[48] VERAMACHANENI S, KRISHNA K, AVINASH L, et al. Novel Architecture for High-Speed and Low-Power 3-2, 4-2 and 5-2 Compressors; proceedings of the International Conference on International Conference on Vlsi Desi gn, F, 2007 [C]. 2007.
[49] NAJAFI A, MAZLOOM-NEZHAD B, NAJAFI A. Low-Power and High-Speed 4-2 Compressor; proceedings of the International Convention on Information & Communication Technology Electronics & Microelectronics, F, 2013 [C].2013.
[50] KUMAR S, KUMAR M, et al. 4-2 Compressor Design with New XOR-XNOR Module; proceedings of the Fourth International Conference on Advanced Computing & Communication Technologies, 2014 [C]. 2014.
[51] SHOMRON G, HOROWITZ T, WEISER U. SMT-SA: Simultaneous Multithreading in Systolic Arrays[J]. Ieee Computer Architecture Letters, 2019, 18(2): 99-102.
[52] SHARIFY S, LASCORZ A D, MAHMOUD M, et al. Laconic Deep Learning Inference Acceleration ; proceedings of the 46th International Symposium on Computer Architecture (ISCA) / Workshop on Computer Architecture Education (WCAE), Phoenix, AZ, F 2019 Jun 22 -26, 2019 [C]. 2019.
修改评论