[1] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25:1106-1114.
[2] ZHUANG B, SHEN C, TAN M, et al. Towards effective low-bitwidth convolutional neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition.2018: 7920-7928.
[3] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of machine learning research, 2011, 12(ARTICLE): 2493-2537.
[4] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[A]. 2014.
[5] REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779-788.
[6] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.
[7] HAN S, MAO H, DALLY W J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding[A]. 2015.
[8] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[A]. 2017.
[9] HINTON G, VINYALS O, DEAN J, et al. Distilling the knowledge in a neural network: volume 2[A]. 2015.
[10] JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2704-2713.
[11] MCKINSTRY J L, ESSER S K, APPUSWAMY R, et al. Discovering low-precision networks close to full-precision networks for efficient embedded inference[A]. 2018.
[12] ZHOU Y, MOOSAVI-DEZFOOLI S M, CHEUNG N M, et al. Adaptive quantization for deep neural network[C]//Proceedings of the AAAI Conference on Artificial Intelligence: volume 32. 2018.
[13] FROMM J, PATEL S, PHILIPOSE M. Heterogeneous bitwidth binarization in convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2018, 31: 4006-4015.
[14] LI H, DE S, XU Z, et al. Training quantized nets: A deeper understanding[J]. Advances in Neural Information Processing Systems, 2017, 30: 5813-5823.
[15] COURBARIAUX M, BENGIO Y, DAVID J P. Binaryconnect: Training deep neural networks with binary weights during propagations[J]. Advances in neural information processing systems, 2015, 28: 3123-3131.
[16] RASTEGARI M, ORDONEZ V, REDMON J, et al. Xnor-net: Imagenet classification using binary convolutional neural networks[C]//European conference on computer vision. Springer, 2016: 525-542.
[17] ZHOU S, WU Y, NI Z, et al. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients[A]. 2016.
[18] ZHUANG B, SHEN C, TAN M, et al. Towards effective low-bitwidth convolutional neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 7920-7928.
[19] WANG P, HU Q, ZHANG Y, et al. Two-step quantization for low-bit neural networks[C]//Proceedings of the IEEE Conference on computer vision and pattern recognition. 2018: 4376-4384.
[20] PARK E, KIM D, YOO S. Energy-efficient neural network accelerator based on outlieraware low-precision computation[C]//2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018: 688-698.
[21] LAI L, SUDA N, CHANDRA V. Deep convolutional neural network inference with floating point weights and fixed-point activations[A]. 2017.
[22] HE Q, WEN H, ZHOU S, et al. Effective quantization methods for recurrent neural networks [A]. 2016.
[23] JUDD P, ALBERICIO J, HETHERINGTON T, et al. Stripes: Bit-serial deep neural network computing[C]//2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2016: 1-12.
[24] WANG K, LIU Z, LIN Y, et al. Haq: Hardware-aware automated quantization with mixed precision[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 8612-8620.
[25] HAN S, MAO H, DALLY W. Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015[A].
[26] LIN J, RAO Y, LU J, et al. Runtime neural pruning[J]. Advances in neural information processing systems, 2017, 30: 2178-2188.
[27] ZHU C, HAN S, MAO H, et al. Trained ternary quantization[A]. 2016.
[28] CHOI J, WANG Z, VENKATARAMANI S, et al. Pact: Parameterized clipping activation for quantized neural networks[A]. 2018.
[29] JACOB B, KLIGYS S, CHEN B, et al. Quantization and training of neural networks for efficient integer-arithmetic-only inference[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 2704-2713.
[30] HAN S, MAO H, DALLY W. Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015[A].
[31] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[32] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[A]. 2015.
[33] JOUPPI N P, YOUNG C, PATIL N, et al. In-datacenter performance analysis of a tensor processing unit[C]//Proceedings of the 44th annual international symposium on computer architecture. 2017: 1-12.
[34] CHEN Y H, EMER J, SZE V. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks[J]. ACM SIGARCH computer architecture news, 2016, 44(3): 367-379.
[35] DAI L, CHENG Q, WANG Y, et al. An Energy-Efficient Bit-Split-and-Combination Systolic Accelerator for NAS-Based Multi-Precision Convolution Neural Networks[C]//2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC). 2022: 448-453.
[36] MAO W, LI K, XIE X, et al. A Reconfigurable Multiple-Precision Floating-Point Dot Product Unit for High-Performance Computing[C]//2021 Design, Automation Test in Europe Conference Exhibition (DATE). 2021: 1793-1798.
[37] MAO W, LI K, CHENG Q, et al. A Configurable Floating-Point Multiple-Precision Processing Element for HPC and AI Converged Computing[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2022, 30(2): 213-226.
[38] LI K, MAO W, XIE X, et al. Multiple-Precision Floating-Point Dot Product Unit for Efficient Convolution Computation[C]//2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS). 2021: 1-4.
[39] SHARMA H, PARK J, SUDA N, et al. Bit fusion: Bit-level dynamically composable architecture for accelerating deep neural network[C]//2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2018: 764-775.
[40] MOONS B, UYTTERHOEVEN R, DEHAENE W, et al. 14.5 envision: A 0.26-to-10tops/w subword-parallel dynamic-voltage-accuracy-frequency-scalable convolutional neural network processor in 28nm fdsoi[C]//2017 IEEE International Solid-State Circuits Conference (ISSCC). IEEE, 2017: 246-247.
[41] SANKARADAS M, JAKKULA V, CADAMBI S, et al. A massively parallel coprocessor for convolutional neural networks[C]//2009 20th IEEE International Conference on Applicationspecific Systems, Architectures and Processors. IEEE, 2009: 53-60.
[42] SRIRAM V, COX D, TSOI K H, et al. Towards an embedded biologically-inspired machine vision processor[C]//2010 International Conference on Field-Programmable Technology. IEEE, 2010: 273-278.
[43] CHAKRADHAR S, SANKARADAS M, JAKKULA V, et al. A dynamically configurable coprocessor for convolutional neural networks[C]//Proceedings of the 37th annual international symposium on Computer architecture. 2010: 247-257.
[44] PEEMEN M, SETIO A A, MESMAN B, et al. Memory-centric accelerator design for convolutional neural networks[C]//2013 IEEE 31st International Conference on Computer Design (ICCD). IEEE, 2013: 13-19.
[45] GOKHALE V, JIN J, DUNDAR A, et al. A 240 g-ops/s mobile coprocessor for deep neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2014: 682-687.
[46] GUPTA S, AGRAWAL A, GOPALAKRISHNAN K, et al. Deep learning with limited numerical precision[C]//International conference on machine learning. PMLR, 2015: 1737-1746.
[47] ZHANG C, LI P, SUN G, et al. Optimizing fpga-based accelerator design for deep convolutional neural networks[C]//Proceedings of the 2015 ACM/SIGDA international symposium on field programmable gate arrays. 2015: 161-170.
[48] CHEN T, DU Z, SUN N, et al. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning[J]. ACM SIGARCH Computer Architecture News, 2014, 42(1): 269-284.
[49] DU Z, FASTHUBER R, CHEN T, et al. ShiDianNao: Shifting vision processing closer to the sensor[C]//Proceedings of the 42nd Annual International Symposium on Computer Architecture. 2015: 92-104.
[50] CHEN Y, LUO T, LIU S, et al. Dadiannao: A machine-learning supercomputer[C]//2014 47th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE, 2014: 609-622.
[51] YOO H J, PARK S, BONG K, et al. A 1.93 tops/w scalable deep learning/inference processor with tetra-parallel mimd architecture for big data applications[C]//IEEE international solid-state circuits conference. IEEE, 2015: 80-81.
[52] CAVIGELLI L, GSCHWEND D, MAYER C, et al. Origami: A convolutional network accelerator[C]//Proceedings of the 25th edition on Great Lakes Symposium on VLSI. 2015: 199-204.
[53] WU B, WANG Y, ZHANG P, et al. Mixed precision quantization of convnets via differentiable neural architecture search[A]. 2018.
[54] CAI H, ZHU L, HAN S. Proxylessnas: Direct neural architecture search on target task and hardware[A]. 2018.
[55] WU B, DAI X, ZHANG P, et al. Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019: 10734-10742.
[56] REN A, ZHANG T, YE S, et al. Admm-nn: An algorithm-hardware co-design framework of dnns using alternating direction methods of multipliers[C]//Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. 2019: 925-938.
[57] DING C, LIAO S, WANG Y, et al. Circnn: accelerating and compressing deep neural networks using block-circulant weight matrices[C]//Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 2017: 395-408.
[58] WEN W, WU C, WANG Y, et al. Learning structured sparsity in deep neural networks[J]. Advances in neural information processing systems, 2016, 29: 2074-2082.
[59] RYU S, KIM H, YI W, et al. Bitblade: Area and energy-efficient precision-scalable neural network accelerator with bitwise summation[C]//Proceedings of the 56th Annual Design Automation Conference 2019. 2019: 1-6.
修改评论