[1] RUSSELL S, NORVIG P. Artificial Intelligence: A Modern Approach, 4th, Global ed[M]. Prentice Hall, 2022.
[2] KORTLI Y, JRIDI M, ALFALOU A, et al. Face Recognition Systems: A Survey[J]. Sensors, 2020, 20(2): 342.
[3] NASSIF A B, SHAHIN I, ATTILI I B, et al. Speech Recognition Using Deep Neural Networks: A Systematic Review[J]. IEEE Access, 2019, 7: 19143-19165.
[4] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nat., 2016, 529(7587): 484-489.
[5] LU Z, RATHOD V, VOTEL R, et al. RetinaTrack: Online Single Stage Joint Detection and Tracking[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020. Computer Vision Foundation / IEEE, 2020: 14656- 14666.
[6] LIU D, KONG H, LUO X, et al. Bringing AI to edge: From deep learning’s perspective[J]. Neurocomputing, 2022, 485: 297-320.
[7] YANG H, TATE M. A Descriptive Literature Review and Classification of Cloud Computing Research[J]. Commun. Assoc. Inf. Syst., 2012, 31: 2.
[8] SHUKUR H, ZEEBAREE S, ZEBARI R, et al. Cloud computing virtualization of resources allocation for distributed systems[J]. Journal of Applied Science and Technology Trends, 2020, 1(3): 98-105.
[9] SHI W, DUSTDAR S. The Promise of Edge Computing[J]. Computer, 2016, 49(5): 78-81.
[10] SZE V, CHEN Y, YANG T, et al. Efficient Processing of Deep Neural Networks: A Tutorial and Survey[J]. Proc. IEEE, 2017, 105(12): 2295-2329.
[11] CHENG Y, WANG D, ZHOU P, et al. A Survey of Model Compression and Acceleration for Deep Neural Networks[J]. CoRR, 2017, abs/1710.09282.
[12] GUO Y, YAO A, CHEN Y. Dynamic Network Surgery for Efficient DNNs[C]//Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain. 2016: 1379-1387.
[13] LI H, KADAV A, DURDANOVIC I, et al. Pruning Filters for Efficient ConvNets[C]//5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24- 26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
[14] HE Y, LIU P, WANG Z, et al. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration[C]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019: 4340-4349. 60
[15] WANG Q, LI D, HUANG X, et al. Optimizing FFT-Based Convolution on ARMv8 Multi core CPUs[C]//Lecture Notes in Computer Science: volume 12247 Euro-Par 2020: Parallel Processing - 26th International Conference on Parallel and Distributed Computing, Warsaw, Poland, August 24-28, 2020, Proceedings. Springer, 2020: 248-262.
[16] LI M, LIU Y, LIU X, et al. The Deep Learning Compiler: A Comprehensive Survey[J]. IEEE Trans. Parallel Distributed Syst., 2021, 32(3): 708-727.
[17] CAPRA M, BUSSOLINO B, MARCHISIO A, et al. Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead[J]. IEEE Access, 2020, 8: 225134-225180.
[18] SHUANGFENG L. Tensorflow lite: On-device machine learning framework[J]. Journal of Computer Research and Development, 2020, 57(9): 1839.
[19] JOUPPI N P, YOUNG C, PATIL N, et al. In-Datacenter Performance Analysis of a Tensor Processing Unit[C]//Proceedings of the 44th Annual International Symposium on Computer Architecture, ISCA 2017, Toronto, ON, Canada, June 24-28, 2017. ACM, 2017: 1-12.
[20] HAN S, KANG J, MAO H, et al. ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA[C]//Proceedings of the 2017 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA 2017, Monterey, CA, USA, February 22-24, 2017. ACM, 2017: 75-84.
[21] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[J]. Commun. ACM, 2017, 60(6): 84-90.
[22] BENGIO Y, LECUN Y, HINTON G E. Deep learning for AI[J]. Commun. ACM, 2021, 64(7): 58-65.
[23] KIM Y, PARK E, YOO S, et al. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications[C]//4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 2016.
[24] 李皈颖. 深度模型简化:存储压缩和计算加速[D]. 中国科学技术大学, 2018.
[25] HE Y, LIU P, WANG Z, et al. Filter Pruning via Geometric Median for Deep Convolutional Neural Networks Acceleration[C]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019: 4340-4349.
[26] HE Y, KANG G, DONG X, et al. Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks[C]//Proceedings of the Twenty-Seventh International Joint Conference on Ar tificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org, 2018: 2234- 2240.
[27] ZHUANG Z, TAN M, ZHUANG B, et al. Discrimination-aware Channel Pruning for Deep Neural Networks[C]//Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada. 2018: 883-894.
[28] ZHAO R, LUK W. Efficient Structured Pruning and Architecture Searching for Group Con volution[C]//2019 IEEE/CVF International Conference on Computer Vision Workshops, ICCV Workshops 2019, Seoul, Korea (South), October 27-28, 2019. IEEE, 2019: 1961-1970.
[29] VYSOGORETS A, KEMPE J. Connectivity matters: Neural network pruning through the lens of effective sparsity[J]. Journal of Machine Learning Research, 2023, 24(99): 1-23.
[30] LIBERIS E, LANE N D. Differentiable Neural Network Pruning to Enable Smart Applications on Microcontrollers[J]. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., 2022, 6(4): 171:1-171:19.
[31] WANG X, WANG J, TANG X, et al. Filter Pruning via Filters Similarity in Consecutive Layers [J]. CoRR, 2023, abs/2304.13397.
[32] HAN S, MAO H, DALLY W J. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding[C]//4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 2016.
[33] HUBARA I, COURBARIAUX M, SOUDRY D, et al. Binarized Neural Networks[C]// Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, December 5-10, 2016, Barcelona, Spain. 2016: 4107-4115.
[34] ZHU C, HAN S, MAO H, et al. Trained Ternary Quantization[C]//5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
[35] ZHOU S, NI Z, ZHOU X, et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients[J]. CoRR, 2016, abs/1606.06160.
[36] ZHOU A, YAO A, WANG K, et al. Explicit Loss-Error-Aware Quantization for Low-Bit Deep Neural Networks[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 2018: 9426-9435.
[37] RIGAMONTI R, SIRONI A, LEPETIT V, et al. Learning Separable Filters[C]//2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, June 23-28, 2013. IEEE Computer Society, 2013: 2754-2761.
[38] DENTON E L, ZAREMBA W, BRUNA J, et al. Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation[C]//Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada. 2014: 1269-1277.
[39] JADERBERG M, VEDALDI A, ZISSERMAN A. Speeding up Convolutional Neural Networks with Low Rank Expansions[C]//British Machine Vision Conference, BMVC 2014, Nottingham, UK, September 1-5, 2014. BMVA Press, 2014.
[40] TAI C, XIAO T, WANG X, et al. Convolutional neural networks with low-rank regularization [C]//4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings. 2016. 62
[41] HINTON G E, VINYALS O, DEAN J. Distilling the Knowledge in a Neural Network[J]. CoRR, 2015, abs/1503.02531.
[42] ZHANG Y, XIANG T, HOSPEDALES T M, et al. Deep Mutual Learning[C]//2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018. Computer Vision Foundation / IEEE Computer Society, 2018: 4320-4328.
[43] MIRZADEH S, FARAJTABAR M, LI A, et al. Improved Knowledge Distillation via Teacher Assistant[C]//The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. AAAI Press, 2020: 5191-5198.
[44] FU H, ZHOU S, YANG Q, et al. LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding[C]//Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 2021: 12830-12838.
[45] WANG K, LIU Z, LIN Y, et al. HAQ: Hardware-Aware Automated Quantization With Mixed Precision[C]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019: 8612- 8620.
[46] HONG W, LI G, LIU S, et al. Multi-objective evolutionary optimization for hardware-aware neural network pruning[J]. Fundamental Research, 2022.
[47] YANG S, CHEN W, ZHANG X, et al. AUTO-PRUNE: automated DNN pruning and mapping for ReRAM-based accelerator[C]//ICS ’21: 2021 International Conference on Supercomputing, Virtual Event, USA, June 14-17, 2021. ACM, 2021: 304-315.
[48] HE Y, LIN J, LIU Z, et al. AMC: AutoML for Model Compression and Acceleration on Mobile Devices[C]//Lecture Notes in Computer Science: volume 11211 Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part VII. Springer, 2018: 815-832.
[49] YANG T, HOWARD A G, CHEN B, et al. NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications[C]//Lecture Notes in Computer Science: volume 11214 Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8-14, 2018, Proceedings, Part X. Springer, 2018: 289-304.
[50] YU F, HAN C, WANG P, et al. HFP: Hardware-Aware Filter Pruning for Deep Convolutional Neural Networks Acceleration[C]//25th International Conference on Pattern Recognition, ICPR 2020, Virtual Event / Milan, Italy, January 10-15, 2021. IEEE, 2020: 255-262.
[51] SHEN M, YIN H, MOLCHANOV P, et al. HALP: Hardware-Aware Latency Pruning[J]. CoRR, 2021, abs/2110.10811.
[52] LI W, WANG R, QIAN D. CompactNet: Platform-Aware Automatic Optimization for Convolutional Neural Networks[C]//PMAM@PPoPP 2021: Proceedings of the Twelfth International Workshop on Programming Models and Applications for Multicores and Manycores, Virtual Event, Republic of Korea, 27 February 2021. ACM, 2021: 11-20.
[53] ELSKEN T, METZEN J H, HUTTER F. Neural Architecture Search[M]. Cham: Springer International Publishing, 2019: 63-77.
[54] DAI X, ZHANG P, WU B, et al. ChamNet: Towards Efficient Network Design Through Platform-Aware Model Adaptation[C]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019: 11398-11407.
[55] KURTIC E, FRANTAR E, ALISTARH D. ZipLM: Hardware-Aware Structured Pruning of Language Models[J]. CoRR, 2023, abs/2302.04089.
[56] YANG H, ZHU Y, LIU J. ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model[C]//IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 16-20, 2019. Computer Vision Foundation / IEEE, 2019: 11206-11215.
[57] HUANG K, CHEN S, LI B, et al. Acceleration-Aware Fine-Grained Channel Pruning for Deep Neural Networks via Residual Gating[J]. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 2022, 41(6): 1902-1915.
[58] YANG T, CHEN Y, SZE V. Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017. IEEE Computer Society, 2017: 6071-6079.
[59] XIAO J, ZHANG C, GONG Y, et al. HALOC: Hardware-Aware Automatic Low-Rank Com pression for Compact Neural Networks[J]. CoRR, 2023, abs/2301.09422.
[60] ROSENBLATT F. The perceptron, a perceiving and recognizing automaton Project Para[M]. Cornell Aeronautical Laboratory, 1957.
[61] AMARI S. Backpropagation and stochastic gradient descent method[J]. Neurocomputing, 1993, 5(3): 185-196.
[62] CUTKOSKY A, MEHTA H. Momentum Improves Normalized SGD[C]//Proceedings of Ma chine Learning Research: volume 119 Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event. PMLR, 2020: 2260-2268.
[63] DUCHI J C, HAZAN E, SINGER Y. Adaptive Subgradient Methods for Online Learning and Stochastic Optimization[J]. J. Mach. Learn. Res., 2011, 12: 2121-2159.
[64] KINGMA D P, BA J. Adam: A Method for Stochastic Optimization[C]//3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings. 2015.
[65] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J/OL]. Proc. IEEE, 1998, 86(11): 2278-2324. https://doi.org/10.1109/5.726791.
[66] DENG L, LI G, HAN S, et al. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey[J]. Proc. IEEE, 2020, 108(4): 485-532.
[67] HAN S, POOL J, TRAN J, et al. Learning both Weights and Connections for Efficient Neural Network[C]//Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7-12, 2015, Montreal, Quebec, Canada. 2015: 1135-1143.
[68] SRIVASTAVA N, HINTON G E, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. J. Mach. Learn. Res., 2014, 15(1): 1929-1958.
[69] LECUN Y, DENKER J S, SOLLA S A. Optimal Brain Damage[C]//Advances in Neural Information Processing Systems 2, [NIPS Conference, Denver, Colorado, USA, November 27-30, 1989]. Morgan Kaufmann, 1989: 598-605.
[70] HANSON S J, PRATT L Y. Comparing Biases for Minimal Network Construction with Back Propagation[C]//Advances in Neural Information Processing Systems 1, [NIPS Conference, Denver, Colorado, USA, 1988]. Morgan Kaufmann, 1988: 177-185.
[71] HASSIBI B, STORK D G. Second Order Derivatives for Network Pruning: Optimal Brain Surgeon[C]//Advances in Neural Information Processing Systems 5, [NIPS Conference, Denver, Colorado, USA, November 30 - December 3, 1992]. Morgan Kaufmann, 1992: 164-171.
[72] SRINIVAS S, BABU R V. Data-free Parameter Pruning for Deep Neural Networks[C]// Proceedings of the British Machine Vision Conference 2015, BMVC 2015, Swansea, UK, September 7-10, 2015. BMVA Press, 2015: 31.1-31.12.
[73] CHEN W, WILSON J T, TYREE S, et al. Compressing Neural Networks with the Hashing Trick[C]//JMLR Workshop and Conference Proceedings: volume 37 Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015. JMLR.org, 2015: 2285-2294.
[74] ULLRICH K, MEEDS E, WELLING M. Soft Weight-Sharing for Neural Network Compression [C]//5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017.
[75] LIANG T, GLOSSNER J, WANG L, et al. Pruning and quantization for deep neural network acceleration: A survey[J]. Neurocomputing, 2021, 461: 370-403.
[76] FRANKLE J, CARBIN M. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks[C]//7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net, 2019.
[77] HOLLEMANS M. How fast is my model?[EB/OL]. 2018
[2018-07-30]. https://machinethink .net/blog/how-fast-is-my-model/.
[78] MARCULESCU D, STAMOULIS D, CAI E. Hardware-aware machine learning: modeling and optimization[C]//Proceedings of the International Conference on Computer-Aided Design, ICCAD 2018, San Diego, CA, USA, November 05-08, 2018. ACM, 2018: 137.
[79] NVIDIA. Jetson Xavier NX[EB/OL]. 2022
[2022-08-24]. https://www.nvidia.cn/autonomous -machines/embedded-systems/jetson-xavier-nx/.
[80] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet Classification with Deep Convolutional Neural Networks[C]//Conference and Workshop on Neural Information Processing Systems. Lake Tahoe, Nevada, 2012: 1106-1114.
[81] LI G, QIAN C, JIANG C, et al. Optimization based Layer-wise Magnitude-based Pruning for DNN Compression[C]//Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. ijcai.org, 2018: 2383-2389.
[82] QIAN C. Distributed Pareto Optimization for Large-Scale Noisy Subset Selection[J]. IEEE Trans. Evol. Comput., 2020, 24(4): 694-707.
[83] KRIZHEVSKY A, HINTON G E, et al. Learning multiple layers of features from tiny images [M]. Toronto, ON, Canada, 2009.
修改评论