南方科技大学知识苑(SUSTech KC): HIGH-PRECISION STEREO MATCHING ACCELERATOR BASED ON GRADIENT INFORMATION

题名	HIGH-PRECISION STEREO MATCHING ACCELERATOR BASED ON GRADIENT INFORMATION
其他题名	基于梯度信息的高精度立体匹配加速器设计
姓名	李可
姓名拼音	LI Ke
学号	12132452
学位类型	硕士
学位专业	0809 电子科学与技术
学科门类/专业学位类别	08 工学
导师	安丰伟
导师单位	深港微电子学院
论文答辩日期	2024-05-16
论文提交日期	2024-06-16
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	Stereoscopic vision, mimicking the human binocular system to perceive depth, finds extensive applications in areas such as autonomous driving, 3D reconstruction, and facial recognition. Designing a dedicated hardware processor for stereoscopic vision poses the challenge of balancing high accuracy with high speed and low resource consumption. Addressing the matching challenges in stereoscopic vision, this paper proposes a novel semi-global stereo-matching hardware accelerator. This accelerator significantly enhances the accuracy of disparity information processing and optimizes hardware performance by executing four key steps collaboratively. The specific work and innovations are outlined as follows: Firstly, in the initial cost calculation stage, a pipeline-style initial cost calculation architecture based on gradient information is proposed to enhance the robustness of stereo matching. By adding a gradient calculation module, the requirement for a row buffer is drastically reduced, achieving over 60% resource savings. Linear approximation methods are employed to optimize the hardware implementation of exponential functions, significantly reducing resource usage while minimizing precision loss. Secondly, in the cost aggregation stage, this paper combines dual-path cost aggregation with guided filter aggregation. The aggregation path is optimized, and the critical path of aggregation calculation is split into multiple clock cycles, enhancing the system's operating frequency. Additionally, by streamlining the computation process of guided filtering and utilizing multiplication instead of division, dual optimization of computational efficiency and resource usage is achieved. Furthermore, the disparity calculation section introduces sub-pixel interpolation techniques based on SRT division, utilizing additional fixed-point decimals to refine disparities, thus improving the accuracy of the disparity map for distant objects. An average accuracy gain of 2% is achieved on the KITTI2015 dataset. Lastly, in the post-processing stage, an efficient synchronous hole-filling and median filtering hardware architecture is designed. Significant resource savings are achieved through cascading row buffer reuse. Additionally, two different effects can be selectively output for occluded regions to meet diverse application requirements. In terms of performance evaluation, this study achieves an average error rate of 5.26% on the KITTI2015 dataset, representing a reduction of 1.62% compared to other semi-global stereo-matching research. The designed architecture requires only 94,510 LUTs on the Stratix-V FPGA platform under conditions of 1920×1080 resolution and a 128-disparity range, significantly lower than other studies. Moreover, the system operates at a frequency of 125MHz, achieving a high throughput of 60 frames per second.
其他摘要	双目立体视觉是一种模仿人类双眼以获取距离感知的技术，被广泛应用于自动驾驶汽车、三维重建、人脸识别等领域。设计一款专用的双目立体视觉硬件处理器面临着平衡高精度、高速度与低资源消耗之间的挑战。针对双目立体视觉的匹配挑战，本文提出了一种全新的半全局立体匹配硬件加速器，该加速器通过协同执行四个关键步骤显著提升了视差信息处理的精度并优化了硬件性能。具体工作和创新点如下：首先，在初始代价计算阶段，本文提出了一种基于梯度信息的流水线式初始代价计算架构，以提高立体匹配的鲁棒性。通过增加梯度计算模块，大幅减少行缓存需求，实现了超过60%的资源节省。采用线性逼近方法以优化指数函数的硬件实现，在最小化精度损失的同时显著降低资源使用。其次，在代价聚合环节，本文结合了双路径代价聚合和引导滤波聚合。优化了聚合路径并将聚合计算的关键路径拆分多个时钟周期计算，提升了系统的工作频率。此外，通过精简引导滤波的计算过程，利用乘法代替除法的思想，实现了计算效率与资源占用的双重优化。进一步地，视差计算部分引入了了基于SRT除法的亚像素插值技术，利用额外的定点小数来细化视差，提高了对远处物体视差图的精确度。在KITTI2015数据集上达到了2%的平均精度增益。最后，在后处理阶段，设计了一种高效的同步空洞填充和中值滤波硬件架构，通过级联行缓存的复用显著降低了资源使用。同时针对遮挡区域可选择性输出两种效果，以满足多变的应用需求。在性能评估方面，本研究在KITTI2015数据集上实现了5.26%的错误率，相较于其他半全局立体匹配研究降低了1.62个百分点。所设计架构在Stratix-V FPGA平台、1920×1080分辨率、128视差范围等同条件下，LUTs仅需94,510个，显著低于其他研究。并且，系统可以工作在125MHz的频率下实现每秒60帧的高吞吐量。
关键词	Field Programmable Gate Array Hardware Accelerator Binocular Vision Semi-global Stereo Matchingching
其他关键词	可编辑逻辑门阵列硬件加速器双目立体视觉半全局立体匹配
语种	英语
培养类别	独立培养
入学年份	2021
学位授予年份	2024-06
参考文献列表	[1] LI M, KWOH L K, YANG C J, et al. 3D building extraction with semi-global match-ing from stereo pair worldview-2 satellite imageries[C]. 2015 IEEE International Ge-oscience and Remote Sensing Symposium (IGARSS). IEEE, 2015: 3006-3009. [2] SARIKA S, DEEPAMBIKA V A, RAHMAN M A. Census filtering based stere-omatching under varying radiometric conditions[J]. Procedia Computer Science, 2015, 58: 315-320. [3] MEI X, SUN X, ZHOU M, et al. On building an accurate stereo matching system on graphics hardware[C]. 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops). IEEE, 2011: 467-474. [4] HIRSCHMULLER H. Stereo processing by semiglobal matching and mutual infor-mation[J]. IEEE Transactions on pattern analysis and machine intelligence, 2007, 30(2): 328-341. [5] ZABIH R, WOODFILL J. Non-parametric local transforms for computing visual cor-respondence[C]. Computer Vision-ECCV'94: Third European Conference on Com-puter Vision Stockholm, Sweden, May 2-6 1994 Proceedings, Volume II 3. Springer Berlin Heidelberg, 1994: 151-158. [6] HIRSCHMULLER H. Accurate and efficient stereo processing by semi-global match-ing and mutual information[C]. 2005 IEEE Computer Society Conference on Com-puter Vision and Pattern Recognition (CVPR'05). IEEE, 2005, 2: 807-814. [7] CHAI Y, CAO X. Stereo matching algorithm based on joint matching cost and adap-tive window[C]. 2018 IEEE 3rd Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). IEEE, 2018: 442-446. [8] ZHU S, YAN L. Local stereo matching algorithm with efficient matching cost and adaptive guided image filter[J]. The Visual Computer, 2017, 33: 1087-1102. [9] HONG P N, AHN C W. Robust matching cost function based on evolutionary ap-proach[J]. Expert Systems with Applications, 2020, 161: 113712. [10] ZHANG K, LU J, LAFRUIT G. Cross-based local stereo matching using orthogonal integral images[J]. IEEE transactions on circuits and systems for video technology, 2009, 19(7): 1073-1079. [11] YOON K J, KWEON I S. Adaptive support-weight approach for correspondence search[J]. IEEE transactions on pattern analysis and machine intelligence, 2006, 28(4): 650-656. [12] HOSNI A, RHEMANN C, BLEYER M, et al. Fast cost-volume filtering for visual correspondence and beyond[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 35(2): 504-511. [13] PANTILIE C D, NEDEVSCHI S. SORT-SGM: Subpixel optimized real-time semi-global matching for intelligent vehicles[J]. IEEE Transactions on Vehicular Technol-ogy, 2012, 61(3): 1032-1042. [14] FAN R, AI X, DAHNOUN N. Road surface 3d reconstruction based on dense sub-pixel disparity map estimation[J]. IEEE Transactions on Image Processing, 2018, 27(6): 3025-3035. [15] COCHRAN S D, MEDIONI G. 3-D surface description from binocular stereo[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 1992, 14(10): 981-994. [16] BIRCHFIELD S, TOMASI C. A pixel dissimilarity measure that is insensitive to im-age sampling[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1998, 20(4): 401-406. [17] VIJAYANAGAR K R, LOGHMAN M, KIM J. Real-time refinement of kinect depth maps using multi-resolution anisotropic diffusion[J]. Mobile Networks and Applica-tions, 2014, 19: 414-425. [18] MICHAEL M, SALMEN J, STALLKAMP J, et al. Real-time stereo vision: Optimiz-ing semi-global matching[C]. 2013 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2013: 1197-1202. [19] HOSNI A, RHEMANN C, BLEYER M, et al. Fast cost-volume filtering for visual correspondence and beyond[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 35(2): 504-511. [20] MA Z, HE K, WEI Y, et al. Constant time weighted median filtering for stereo match-ing and beyond[C]. Proceedings of the IEEE International Conference on Computer Vision. 2013: 49-56. [21] ZHANG Y, ZHENG Y, LING Y, et al. A robust and real-time DNN-based multi-baseline stereo accelerator in FPGAs[J]. Journal of Systems Architecture, 2023, 143: 102966. [22] XU H, ZHANG J. Aanet: Adaptive aggregation network for efficient stereo match-ing[C]. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 1959-1968. [23] SHEN Z, DAI Y, RAO Z. Cfnet: Cascade and fused cost volume for robust stereo matching[C]. Proceedings of the IEEE/CVF Conference on Computer Vision and Pat-tern Recognition. 2021: 13906-13915. [24] GEHRIG S K, RABE C. Real-time semi-global matching on the CPU[C]. 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops. IEEE, 2010: 85-92. [25] LI, J., WU, J., YOU, Y., & Jeon, G. (2020). Parallel binocular stereo-vision-based GPU accelerated pedestrian detection and distance computation. Journal of Real-Time Image Processing, 17(3), 447-457. [26] AGUILERA C A, AGUILERA C, NAVARRO C A, et al. Fast CNN stereo depth esti-mation through embedded GPU devices[J]. Sensors, 2020, 20(11): 3249. [27] SEKI A, POLLEFEYS M. Sgm-nets: Semi-global matching with neural networks[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 231-240. [28] JIN S, CHO J, DAI PHAM X, et al. FPGA design and implementation of a real-time stereo vision system[J]. IEEE transactions on circuits and systems for video tech-nology, 2009, 20(1): 15-26. [29] ZHANG X, SUN H, CHEN S, et al. NIPM-sWMF: Toward efficient FPGA design for high-definition large-disparity stereo matching[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 29(5): 1530-1543. [30] CHEN G, LING Y, HE T, et al. StereoEngine: An FPGA-based accelerator for real-time high-quality stereo estimation with binary neural network[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2020, 39(11): 4179-4190. [31] LING Y, HE T, ZHANG Y, et al. Lite-stereo: a resource-efficient hardware accelera-tor for real-time high-quality stereo estimation using binary neural network[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(12): 5357-5366. [32] CAMBUIM L F S, BARBOSA J P F, BARROS E N S. Hardware module for low-resource and real-time stereo vision engine using semi-global matching approach[C]. Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the Sands. 2017: 53-58. [33] BONG K, LEE K, YOO H J. A 590MDE/s semi-global matching processor with loss-less data compression[C]. 2017 30th IEEE International System-on-Chip Conference (SOCC). IEEE, 2017: 18-22. [34] MIN F, XU H, WANG Y, et al. Dadu-eye: A 5.3 TOPS/W, 30 fps/1080p high accura-cy stereo vision accelerator[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2021, 68(10): 4207-4220. [35] RONG Y, DUAN X, HAN J. A high-throughput and low-storage stereo vision accel-erator with dependency-resolving strided aggregation for 8-path semi-global match-ing[J]. Microelectronics Journal, 2024: 106156. [36] ZHANG Z. A flexible new technique for camera calibration[J]. IEEE Transactions on pattern analysis and machine intelligence, 2000, 22(11): 1330-1334. [37] 闫利, 王芮, 刘华, 等. 基于改进代价计算和自适应引导滤波的立体匹配[J]. Acta Optica Sinica, 2018, 38(11): 1115007. [38] HE K, SUN J, TANG X. Guided image filtering[J]. IEEE transactions on pattern analysis and machine intelligence, 2012, 35(6): 1397-1409. [39] ERCEGOVAC M T D, LANG T. Division and square root: digit-recurrence algo-rithms and implementations[M]. Kluwer Academic Publishers, 1994. [40] AVIZIENIS A. Signed-digit numbe representations for fast parallel arithmetic[J]. IRE Transactions on electronic computers, 1961 (3): 389-400. [41] WANG M, LU S, ZHU D, et al. A high-speed and low-complexity architecture for softmax function in deep learning[C]. 2018 IEEE asia pacific conference on circuits and systems (APCCAS). IEEE, 2018: 223-226. [42] NILSSON P, SHAIK A U R, GANGARAJAIAH R, et al. Hardware implementation of the exponential function using Taylor series[C]. 2014 NORCHIP. IEEE, 2014: 1-4. [43] DONG P, CHEN Z, LI Z, et al. A 4.29 nJ/pixel stereo depth coprocessor with pixel level pipeline and region optimized semi-global matching for IoT application[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2021, 69(1): 334-346. [44] LU Z, WANG J, LI Z, et al. A resource-efficient pipelined architecture for real-time semi-global stereo matching[J]. IEEE Transactions on Circuits and Systems for Vid-eo Technology, 2021, 32(2): 660-673. [45] CAMBUIM L F S, OLIVEIRA JR L A, BARROS E N S, et al. An FPGA-based real-time occlusion robust stereo vision system using semi-global matching[J]. Journal of Real-Time Image Processing, 2020, 17(5): 1447-1468. [46] LEE Y, KIM H. A high-throughput depth estimation processor for accurate semi-global stereo matching using pipelined inter-pixel aggregation[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2021, 32(1): 411-422. [47] SHI G, WANG X, OUYANG Y, et al. A Spatio-Temporal Video Denoising Co-Processor With Adaptive Codec[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023. [48] GAO Z, AN F, CHEN L. A non-local means denoising co-processor with data reuse scheme and dual-clock domain for high resolution image sensor[C]. 2023 IEEE In-ternational Conference on Integrated Circuits, Technologies and Applications (ICTA). IEEE, 2023: 146-147.
所在学位评定分委会	电子科学与技术
国内图书分类号	TN47
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/765631
专题	南方科技大学南方科技大学-香港科技大学深港微电子学院筹建办公室
推荐引用方式 GB/T 7714	Li K. HIGH-PRECISION STEREO MATCHING ACCELERATOR BASED ON GRADIENT INFORMATION[D]. 深圳. 南方科技大学,2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12132452-李可-南方科技大学-香（3204KB）	--	--	限制开放	--	请求全文