[1] CHANG Z, LIU S, XIONG X, et al. A survey of recent advances in edge-computing-powered artificial intelligence of things[J]. IEEE Internet of Things Journal, 2021, 8(18): 13849-13875.
[2] JIA L, ZHOU Z, XU F, et al. Cost-efficient continuous edge learning for artificial intelligence of things[J]. IEEE Internet of Things Journal, 2021, 9(10): 7325-7337.
[3] BOROUJERDIAN B, GENC H, KRISHNAN S, et al. Why compute matters for UAV energy efficiency?[Z]. 2018.
[4] HAO C, DOTZEL J, XIONG J, et al. Enabling design methodologies and future trends for edge AI: Specialization and codesign[J]. IEEE Design & Test, 2021, 38(4): 7-26.
[5] LIANG T, GLOSSNER J, WANG L, et al. Pruning and quantization for deep neural network acceleration: A survey[J]. Neurocomputing, 2021, 461: 370-403.
[6] LAI L, SUDA N, CHANDRA V. Cmsis-nn: Efficient neural network kernels for arm cortex-m cpus[A]. 2018.
[7] HAN S, MAO H, DALLY W J. Deep compression: Compressing deep neural networks with runing, trained quantization and huffman coding[A]. 2015.
[8] YAO S, ZHAO Y, ZHANG A, et al. Deepiot: Compressing deep neural network structures for sensing systems with a compressor-critic framework[C]//Proceedings of the 15th ACM conference on embedded network sensor systems. 2017: 1-14.
[9] HINTON G, VINYALS O, DEAN J. Distilling the knowledge in a neural network[A]. 2015.
[10] DAVID R, DUKE J, JAIN A, et al. Tensorflow lite micro: Embedded machine learning for tinyml systems[J]. Proceedings of Machine Learning and Systems, 2021, 3: 800-811.
[11] ARM. Arm NN SDK–Arm®[EB/OL]. (2024-3-1)
[2024-3-1]. https://www.arm.com/products/silicon-ip-cpu/ethos/arm-nn.
[12] STMICROELECTRONICS. X-CUBE-AI - AI expansion pack for STM32CubeMX–STMicroelectronics[EB/OL]. (2024-3-1)
[2024-3-1]. https://www.st.com/en/embedded-software/x-cube-ai.html.
[13] CAPOTONDI A, RUSCI M, FARISELLI M, et al. CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices[J]. IEEE Transactions on Circuits and Systems II: Express Briefs, 2020, 67(5): 871-875.
[14] IANDOLA F N, HAN S, MOSKEWICZ M W, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[A]. 2016.
[15] HOWARD A G, ZHU M, CHEN B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[A]. 2017.
[16] ZHANG X, ZHOU X, LIN M, et al. Shufflenet: An extremely efficient convolutional neural network for mobile devices[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 6848-6856.
[17] STRUBELL E, GANESH A, MCCALLUM A. Energy and policy considerations for deep learning in NLP[A]. 2019.
[18] GHOLAMI A, KWON K, WU B, et al. Squeezenext: Hardware-aware neural network design [C]//Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2018: 1638-1647.
[19] LIN J, CHEN W M, LIN Y, et al. Mcunet: Tiny deep learning on iot devices[J]. Advances in Neural Information Processing Systems, 2020, 33: 11711-11722.
[20] TAN M, CHEN B, PANG R, et al. Mnasnet: Platform-aware neural architecture search for mobile[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 2820-2828.
[21] LAVIN A, GRAY S. Fast algorithms for convolutional neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 4013-4021.
[22] YANG T, LIAO Y, SHI J, et al. A Winograd-based CNN accelerator with a fine-grained regular sparsity pattern[C]//2020 30th International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2020: 254-261.
[23] LIU X, POOL J, HAN S, et al. Efficient sparse-winograd convolutional neural networks[A]. 2018.
[24] OLSON L E, HILL M D, WOOD D A. Crossing guard: Mediating host-accelerator coherence interactions[J]. ACM SIGARCH Computer Architecture News, 2017, 45(1): 163-176.
[25] LIM S H, SUH W W, KIM J Y, et al. RISC-V Virtual Platform-Based Convolutional Neural Network Accelerator Implemented in SystemC[J]. Electronics, 2021, 10(13): 1514.
[26] MELONI P, GARUFI A, DERIU G, et al. CNN hardware acceleration on a low-power and lowcost APSoC[C]//2019 Conference on Design and Architectures for Signal and Image Processing (DASIP). IEEE, 2019: 7-12.
[27] LOUIS M S, AZAD Z, DELSHADTEHRANI L, et al. Towards deep learning using tensorflow lite on risc-v[C]//Third Workshop on Computer Architecture Research with RISC-V (CARRV): Vol. 1. 2019: 6.
[28] GAROFALO A, TAGLIAVINI G, CONTI F, et al. XpulpNN: Accelerating quantized neural networks on RISC-V processors through ISA extensions[C]//2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 2020: 186-191.
[29] LI Z, HU W, CHEN S. Design and implementation of CNN custom processor based on RISC-V architecture[C]//2019 IEEE 21st International Conference on High Performance Computing and Communications; IEEE 17th International Conference on Smart City; IEEE 5th International Conference on Data Science and Systems (HPCC/SmartCity/DSS). IEEE, 2019: 1945-1950.
[30] LOU W, WANG C, GONG L, et al. RV-CNN: Flexible and efficient instruction set for CNNs based on RISC-V processors[C]//Advanced Parallel Processing Technologies: 13th International Symposium, APPT 2019, Tianjin, China, August 15–16, 2019, Proceedings 13. Springer, 2019: 3-14.
[31] LI D Z, GONG H R, CHANG Y C. Implementing RISCV system-on-chip for acceleration of convolution operation and activation function based on FPGA[C]//2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT). IEEE, 2018: 1-3.
[32] WU N, JIANG T, ZHANG L, et al. A reconfigurable convolutional neural network-accelerated coprocessor based on RISC-V instruction set[J]. Electronics, 2020, 9(6): 1005.
[33] FENG S, WU J, ZHOU S, et al. The implementation of LeNet-5 with NVDLA on RISC-V SoC [C]//2019 IEEE 10th International Conference on Software Engineering and Service Science (ICSESS). IEEE, 2019: 39-42.
[34] 廖汉松. 基于 RISC-V 的卷积神经网络专用指令集处理器研究与设计[D]. 华南理工大学, 2020.
[35] 王松. 基于 RISC-V 与 CNN 协处理器片上系统设计[D]. 西安电子科技大学, 2020.
[36] LI W, CHEN H, HUANG M, et al. Winograd algorithm for addernet[C]//International Conference on Machine Learning. PMLR, 2021: 6307-6315.
[37] XU W, ZHANG Z, YOU X, et al. Reconfigurable and low-complexity accelerator for convolutional and generative networks over finite fields[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(12): 4894-4907.
[38] MA X, LIN S, YE S, et al. Non-structured DNN weight pruning—Is it beneficial in any platform?[J]. IEEE transactions on neural networks and learning systems, 2021, 33(9): 4930-4944.
[39] HAO C, CHEN D. Deep neural network model and FPGA accelerator co-design: Opportunities and challenges[C]//2018 14th IEEE International Conference on Solid-State and Integrated Circuit Technology (ICSICT). IEEE, 2018: 1-4.
[40] HAO C, ZHANG X, LI Y, et al. FPGA/DNN co-design: An efficient design methodology for IoT intelligence on the edge[C]//Proceedings of the 56th Annual Design Automation Conference 2019. 2019: 1-6.
[41] PULP. PULP - An Open Parallel Ultra-Low-Power Processing-Platform[EB/OL]. (2024-3-1)
[2024-3-1]. http://iis-projects.ee.ethz.ch/index.php/PULP.
[42] GAROFALO A, RUSCI M, CONTI F, et al. PULP-NN: accelerating quantized neural networks on parallel ultra-low-power RISC-V processors[J]. Philosophical Transactions of the Royal Society A, 2020, 378(2164): 20190155.
[43] COUSSY P, GAJSKI D D, MEREDITH M, et al. An introduction to high-level synthesis[J]. IEEE Design & Test of Computers, 2009, 26(4): 8-17.
[44] ML-COMMONS. Machine learning innovation to benefit everyone[EB/OL]. (2024-3-1)
[2024-3-1]. https://mlcommons.org/en/.
[45] ZHANG X, WANG J, ZHU C, et al. DNNBuilder: An automated tool for building highperformance DNN hardware accelerators for FPGAs[C]//2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 2018: 1-8.
[46] XU P, ZHANG X, HAO C, et al. AutoDNNchip: An automated DNN chip predictor and builder for both FPGAs and ASICs[C]//Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 2020: 40-50.
[47] DE PRADO M, MUNDY A, SAEED R, et al. Automated design space exploration for optimized deployment of dnn on arm cortex-a cpus[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 40(11): 2293-2305.
[48] MIPS. MIPS DSP–MIPS[EB/OL]. (2024-3-1)
[2024-3-1]. https://mips.com/products/architectures/ase/dsp/.
[49] MIPS. MIPS SIMD–MIPS[EB/OL]. (2024-3-1)
[2024-3-1]. https://mips.com/products/architectures/ase/simd/.
[50] ARM. DSP capabilities of Cortex-M4 and Cortex-M7[EB/OL]. (2024-3-1)
[2024-3-1]. https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/white-paper-dsp-capabilities-of-cortex-m4-and-cortex-m7.
[51] RISC-V. riscv/riscv-p-spec: RISC-V Packed SIMD Extension[EB/OL]. (2022-3-30)
[2024-3-1]. https://github.com/riscv/riscv-p-spec/.
[52] ARM. CMSIS DSP Software Library[EB/OL]. (2024-3-1)
[2024-3-1]. https://arm-software.github.io/CMSIS_5/DSP/html/index.html.
[53] RISC-V. riscv/riscv-v-spec: Working draft of the proposed RISC-V V vector extension[EB/OL]. (2024-1-31)
[2024-3-1]. https://github.com/riscv/riscv-v-spec/.
[54] GAUTSCHI M, SCHIAVONE P D, TRABER A, et al. Near-threshold RISC-V core with DSP extensions for scalable IoT endpoint devices[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2017, 25(10): 2700-2713.
[55] LOWRISC. lowRISC/ibex: Ibex is a small 32 bit RISC-V CPU core, previously known as zero-riscy.[EB/OL]. (2024-3-1)
[2024-3-1]. https://github.com/lowRISC/ibex.
[56] GROUP O. openhwgroup/cv32e40p: CV32E40P is an in-order 4-stage RISC-V RV32IMFCXpulp CPU based on RI5CY from PULP-Platform[EB/OL]. (2024-2-14)
[2024-3-1]. https://github.com/openhwgroup/cv32e40p.
[57] FLAMAND E, ROSSI D, CONTI F, et al. GAP-8: A RISC-V SoC for AI at the Edge of the IoT [C]//2018 IEEE 29th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 2018: 1-4.
[58] SYSTEM N. riscv-mcu/e203_hbirdv2: The Ultra-Low Power RISC-V Core[EB/OL]. (2023-3-27)
[2024-3-1]. https://github.com/riscv-mcu/e203_hbirdv2.
[59] SPINALHDL. SpinalHDL/VexRiscv: A FPGA friendly 32 bit RISC-V CPU implementation [EB/OL]. (2024-2-1)
[2024-3-1]. https://github.com/SpinalHDL/VexRiscv.
[60] AYACHI R, AFIF M, SAID Y, et al. Strided convolution instead of max pooling for memory efficiency of convolutional neural networks[C]//Proceedings of the 8th International Conference on Sciences of Electronics, Technologies of Information and Telecommunications (SETIT’18), Vol. 1. Springer, 2020: 234-243.
[61] HSIAO T Y, CHANG Y C, CHOU H H, et al. Filter-based deep-compression with global average pooling for convolutional networks[J]. Journal of Systems Architecture, 2019, 95: 9-18.
[62] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Advances in neural information processing systems, 2012, 25.
[63] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[A]. 2014.
[64] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[65] OF STANDARDS N I, TECHNOLOGY. THE MNIST DATABASE of handwritten digit [EB/OL]. (2024-3-1)
[2024-3-1]. http://yann.lecun.com/exdb/mnist/.
[66] ALEX KRIZHEVSKY I S. CIFAR-10 and CIFAR-100 datasets: The CIFAR-10 datase [EB/OL]. (2024-3-1)
[2024-3-1]. http://www.cs.toronto.edu/~kriz/cifar.html.
[67] CHOWDHERY A, WARDEN P, SHLENS J, et al. Visual wake words dataset[A]. 2019.
[68] DENG J, DONG W, SOCHER R, et al. Imagenet: A large-scale hierarchical image database [C]//2009 IEEE conference on computer vision and pattern recognition. Ieee, 2009: 248-255.
[69] RESEARCH Z. Fashion-mnist: A MNIST-like fashion product database. Benchmark[EB/OL] (2022-3-21)
[2024-3-1]. https://github.com/zalandoresearch/fashion-mnist.
[70] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[71] MICROSOFT. COCO - Common Objects in Context[EB/OL]. (2022-3-21)
[2024-3-1]. https://cocodataset.org/#home.
[72] ML-COMMONS. Benchmark MLPerf Inference: Tiny | MLCommons V1.1 Results[EB/OL]. (2024-3-1)
[2024-3-1]. https://mlcommons.org/benchmarks/inference-tiny/.
[73] SIPEED. Sipeed/TinyMaix: TinyMaix is a tiny inference library for microcontrollers (TinyML). [EB/OL]. (2023-4-26)
[2024-3-1]. https://github.com/sipeed/TinyMaix.
[74] SIPEED. TinyMaix/benchmark: Test Models and Test Record[EB/OL]. (2023-4-13)
[2024-3-1]. https://github.com/sipeed/TinyMaix/blob/main/benchmark.md.
修改评论