南方科技大学知识苑(SUSTech KC): 基于半精度浮点格式和GPU并行优化的三维地震模拟

题名	基于半精度浮点格式和GPU并行优化的三维地震模拟
其他题名	THREE-DIMENSIONAL EARTHQUAKE SIMULATION BASED ON HALF-PRECISION FLOATING-POINT FORMAT AND GPU PARALLEL OPTIMIZATION
姓名	万家亮
姓名拼音	WAN Jialiang
学号	12132705
学位类型	硕士
学位专业	080101 一般力学与力学基础
学科门类/专业学位类别	08 工学
导师	徐建宽
导师单位	地球与空间科学系；地球与空间科学系
论文答辩日期	2024-05-10
论文提交日期	2024-06-21
学位授予单位	南方科技大学
学位授予地点	深圳
摘要	三维地震模拟对研究地震波传播过程、评估地震灾害风险以及指导资源勘探等具有重要的意义。然而，进行大规模地震模拟需要非常高的计算成本，其对计算资源和存储资源的庞大需求限制了地震模拟的计算效率。过去，地震模拟使用的浮点数据类型通常是单精度和双精度浮点格式。而近年来人工智能的迅速发展推动了半精度浮点格式（FP16）的广泛应用，许多新的处理器开始原生支持 FP16 的算术和存储操作，以提供更高效的处理速度和计算性能。因此，本文将半精度浮点格式引入到地震模拟中，旨在提高数值模拟的计算效率，减少对硬件资源的需求。本文基于曲线网格有限差分方法（CGFDM），使用 FP16 格式求解弹性波动方程。然而，FP16 有限的表示范围和浮点精度使得计算过程中难以避免地产生数值溢出和舍入误差，导致模拟结果不可用。为了获得稳定的波动方程解并最小化浮点误差，本文采用缩放策略来调整 FP16 的算术运算，确保计算数值始终在 FP16 的表示范围内，并维持在一个相近的量级上，进而提出了适用于 FP16 的缩放的弹性波动方程。随后，本文在 GPU 异构计算平台上开发了 FP16 的地震模拟程序，并结合浮点运算单元的硬件支持进行了 2 路单指令多数据（SIMD）优化，以充分发挥GPU 的计算能力。通过不同模型的地震模拟以及与 FP32 模拟结果的对比分析，验证了 FP16 地震模拟的可行性和准确性。时频误差分析的结果表明，低精度的 FP16 格式引入的浮点误差低于 1.0%，处于可接受的范围内。此外，FP16 格式显著降低了计算成本，与 FP32 的模拟相比实现了 2× 的加速比，同时内存使用量减少了一半。综上，FP16 的地震模拟保证了足够的计算精度，并有效提高了计算效率，有助于减少地震模拟时间，扩展模拟规模大小。
其他摘要	Three-dimensional earthquake simulations are very significant to studying seismic wave propagation, evaluating earthquake hazards, and advancing exploration seismology. However, achieving large-scale earthquake simulations entails considerable computational cost, which restricts computational efficiency due to substantial demands on both computational and storage resources. In the past, the floating-point data types used in earthquake simulations were typically single and double precision floating-point formats. Recently, the rapid advancement of artificial intelligence has promoted the widespread adoption of half-precision floating-point format (FP16). Many new processors now natively support FP16 arithmetic and storage operations, providing higher processing speed and computational performance. Therefore, this paper introduces the FP16 format into earthquake simulations, aiming to enhance the computational efficiency of numerical simulations and reduce hardware resource demands. Based on curved grid finite difference method (CGFDM), this paper uses the FP16 format to solve elastic wave equations. However, the limited representation range and floating-point precision inherent in FP16 inevitably lead to numerical overflow, underflow and rounding errors during computation, thus rendering the simulation results unreliable. To obtain stable solutions and minimize floating-point errors, a scaling strategy is implemented to adjust FP16 arithmetic operations, ensuring that the calculated values remain within the FP16 representation range and maintain similar order of magnitudes. Subsequently, scaling elastic wave equations are derived for FP16 operations and applied to CGFDM. Additionally, we develop an FP16 earthquake simulation program on GPU heterogeneous computing platform, and implement 2-way single instruction multiple data (SIMD) optimization within the hardware support of floating-point units to fully exploit the computational capabilities of GPUs. Through earthquake simulations across various models and comparative analyses with the FP32 simulation results, the feasibility and accuracy of the FP16 earthquake simulation are verified. Time-frequency error analyses indicate that the floating-point error introduced by the low-precision FP16 format is below 1.0%, which is relatively negligible. Furthermore, the FP16 format significantly reduces computational costs, achieving a 2× speedup and halving memory usage compared to FP32. In conclusion, earthquake simulations of FP16 ensure sufficient computational accuracy while effectively enhancing computational efficiency and enabling further scalability of simulations.
关键词	地震模拟半精度浮点格式曲线网格有限差分方法 GPU 性能优化
其他关键词	Earthquake Simulation Half-precision Floating-point Format Curved Grid Finite Difference Method GPU Performance Optimization
语种	中文
培养类别	独立培养
入学年份	2021
学位授予年份	2024-06
参考文献列表	[1] 邓起东, 张培震, 冉勇康, 等. 中国活动构造基本特征[J]. 中国科学: D 辑, 2002, 32(12):1020-1030. [2] 张培震, 邓起东, 张国民, 等. 中国大陆的强震活动与活动地块[J]. 中国科学: D 辑, 2003,33(B04): 12-20. [3] 中国地震局震灾应急救援司. 2006-2010 年中国大陆地震灾害损失评估汇编[M]. 北京: 地震出版社, 2015. [4] ZHANG W, CHEN X. Traction image method for irregular free surface boundaries in finite difference seismic wave simulation[J]. Geophysical Journal International, 2006, 167(1): 337-353. [5] ZHANG W, ZHANG Z, CHEN X. Three-dimensional elastic wave numerical modelling in the presence of surface topography by a collocated-grid finite-difference method on curvilinear grids[J]. Geophysical Journal International, 2012, 190(1): 358-378. [6] ZHU G, ZHANG Z, WEN J, et al. Preliminary results of strong ground motion simulation for the Lushan earthquake of 20 April 2013, China[J]. Earthquake Science, 2013, 26: 191-197. [7] SUN Y C, ZHANG W, CHEN X. 3D seismic wavefield modeling in generally anisotropic media with a topographic free surface by the curvilinear grid finite-difference method[J]. Bulletin of the Seismological Society of America, 2018, 108(3A): 1287-1301. [8] WANG W, LI Y, ZHANG Z, et al. Rapid estimation of disaster losses for the M 6.8 Luding earthquake on September 5, 2022[J]. Science China Earth Sciences, 2023, 66(6): 1334-1344. [9] KOMATITSCH D, ERLEBACHER G, GÖDDEKE D, et al. High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster[J]. Journal of Computational Physics, 2010, 229(20): 7692-7714. [10] OKAMOTO T, TAKENAKA H, NAKAMURA T, et al. Accelerating large-scale simulation of seismic wave propagation by multi-GPUs and three-dimensional domain decomposition[J]. Earth, Planets and Space, 2010, 62: 939-942. [11] LIU G, LIU Y, REN L, et al. 3D seismic reverse time migration on GPGPU[J]. Computers & Geosciences, 2013, 59: 17-23. [12] DOROZHINSKII R, BADER M. SeisSol on distributed multi-GPU systems: CUDA code generation for the modal discontinuous Galerkin method[C]//The International Conference on High Performance Computing in Asia-Pacific Region. 2021: 69-82. [13] CUI Y, OLSEN K B, JORDAN T H, et al. Scalable earthquake simulation on petascale supercomputers[C]//SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2010: 1-20. [14] ZHANG W, ZHANG Z, FU H, et al. Importance of spatial resolution in ground motion simulations with 3-D basins: An example using the Tangshan earthquake[J]. Geophysical Research Letters, 2019, 46(21): 11915-11924. [15] BAO H, BIELAK J, GHATTAS O, et al. Earthquake ground motion modeling on parallel computers[C]//Proceedings of the 1996 ACM/IEEE Conference on Supercomputing. 1996:13-es. [16] KOMATITSCH D, TSUBOI S, JI C, et al. A 14.6 billion degrees of freedom, 5 teraflops, 2.5 terabyte earthquake simulation on the Earth Simulator[C]//Proceedings of the 2003 ACM/IEEE Conference on Supercomputing. 2003: 4. [17] KOMATITSCH D, MICHÉA D, ERLEBACHER G. Porting a high-order finite-element earthquake modeling application to NVIDIA graphics cards using CUDA[J]. Journal of Parallel and Distributed Computing, 2009, 69(5): 451-460. [18] RIETMANN M, MESSMER P, NISSEN-MEYER T, et al. Forward and adjoint simulations of seismic wave propagation on emerging large-scale GPU architectures[C]//SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE, 2012: 1-11. [19] ICHIMURA T, FUJITA K, TANAKA S, et al. Physics-Based Urban Earthquake Simulation Enhanced by 10.7 BlnDOF × 30 K Time-Step Unstructured FE Non-Linear Seismic Wave Simulation[C/OL]//SC ’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2014: 15-26. DOI: 10.1109/SC.2014.7. [20] ICHIMURA T, FUJITA K, QUINAY P E B, et al. Implicit nonlinear wave simulation with 1.08 T DOF and 0.270 T unstructured finite elements to enhance comprehensive earthquake simulation[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2015: 1-12. [21] ICHIMURA T, FUJITA K, KOYAMA K, et al. 152K-computer-node parallel scalable implicit solver for dynamic nonlinear earthquake simulation[C]//International Conference on High Performance Computing in Asia-Pacific Region. 2022: 18-29. [22] KÄSER M, DUMBSER M. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes—I. The two-dimensional isotropic case with external source terms[J]. Geophysical Journal International, 2006, 166(2): 855-877. [23] DUMBSER M, KÄSER M. An arbitrary high-order discontinuous Galerkin method for elastic waves on unstructured meshes—II. The three-dimensional isotropic case[J]. Geophysical Journal International, 2006, 167(1): 319-336. [24] HEINECKE A, BREUER A, RETTENBERGER S, et al. Petascale high order dynamic rupture earthquake simulations on heterogeneous supercomputers[C]//SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2014: 3-14. [25] BREUER A, HEINECKE A, CUI Y. EDGE: Extreme scale fused seismic simulations with the discontinuous Galerkin method[C]//International Conference on High Performance Computing. Springer, 2017: 41-60. [26] MADARIAGA R. Dynamics of an expanding circular fault[J]. Bulletin of the Seismological Society of America, 1976, 66(3): 639-666. [27] VIRIEUX J. P-SV wave propagation in heterogeneous media: Velocity-stress finite-difference method[J]. Geophysics, 1986, 51(4): 889-901. [28] GRAVES R W. Simulating seismic wave propagation in 3D elastic media using staggered-grid finite differences[J]. Bulletin of the Seismological Society of America, 1996, 86(4): 1091-1106. [29] CUI Y, POYRAZ E, OLSEN K B, et al. Physics-based seismic hazard analysis on petascale heterogeneous supercomputers[C]//Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. 2013: 1-12. [30] FU H, HE C, CHEN B, et al. 18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2017: 1-12. [31] CHEN B, FU H, WEI Y, et al. Simulating the Wenchuan earthquake with accurate surface topography on Sunway TaihuLight[C]//SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 2018: 517-528. [32] WAN W, GAN L, WANG W, et al. 69.7-PFlops Extreme Scale Earthquake Simulation with Crossing Multi-faults and Topography on Sunway[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 2023: 1-15. [33] IEEE. IEEE Standard for Floating-Point Arithmetic[J/OL]. IEEE Std 754-2008, 2008: 1-70. DOI: 10.1109/IEEESTD.2008.4610935. [34] BABOULIN M, BUTTARI A, DONGARRA J, et al. Accelerating scientific computations with mixed precision algorithms[J]. Computer Physics Communications, 2009, 180(12): 2526-2533. [35] CLARK M A, BABICH R, BARROS K, et al. Solving Lattice QCD systems of equations using mixed precision solvers on GPUs[J]. Computer Physics Communications, 2010, 181(9): 1517-1528. [36] WU H, JUDD P, ZHANG X, et al. Integer quantization for deep learning inference: Principles and empirical evaluation[A]. 2020. [37] MICIKEVICIUS P, NARANG S, ALBEN J, et al. Mixed precision training[A]. 2017. [38] DAS D, MELLEMPUDI N, MUDIGERE D, et al. Mixed precision training of convolutional neural networks using integer operations[A]. 2018. [39] JIA X, SONG S, HE W, et al. Highly scalable deep learning training system with mixedprecision: Training imagenet in four minutes[A]. 2018. [40] DENG L, LI G, HAN S, et al. Model compression and hardware acceleration for neural networks: A comprehensive survey[J]. Proceedings of the IEEE, 2020, 108(4): 485-532. [41] DÖRRICH M, FAN M, KIST A M. Impact of Mixed Precision Techniques on Training and Inference Efficiency of Deep Neural Networks[J]. IEEE Access, 2023. [42] HAIDAR A, WU P, TOMOV S, et al. Investigating half precision arithmetic to accelerate dense linear system solvers[C]//Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems. 2017: 1-8. [43] OO K L, VOGEL A. Accelerating geometric multigrid preconditioning with half-precision arithmetic on GPUs[A]. 2020. [44] FREYTAG G, LIMA J V, RECH P, et al. Impact of Reduced and Mixed-Precision on the Efficiency of a Multi-GPU Platform on CFD Applications[C]//International Conference on Computational Science and Its Applications. Springer, 2022: 570-587. [45] HIGHAM N J, MARY T. Mixed precision algorithms in numerical linear algebra[J]. Acta Numerica, 2022, 31: 347-414. [46] FABIEN-OUELLET G. Seismic modeling and inversion using half-precision floating-point numbers[J]. Geophysics, 2020, 85(3): F65-F76. [47] WANG W, ZHANG Z, ZHANG W, et al. Implementation of efficient low-storage techniques for 3-D seismic simulation using the curved grid finite-difference method[J]. Geophysical Journal International, 2023, 234(3): 2214-2230. [48] ABDELKHALEK R, CALANDRA H, COULAUD O, et al. Fast seismic modeling and reverse time migration on a GPU cluster[C]//2009 International Conference on High Performance Computing & Simulation. IEEE, 2009: 36-43. [49] JESPERSEN D C. Acceleration of a CFD code with a GPU[J]. Scientific Programming, 2010,18(3-4): 193-201. [50] EKLUND A, DUFORT P, FORSBERG D, et al. Medical image processing on the GPU–Past, present and future[J]. Medical Image Analysis, 2013, 17(8): 1073-1094. [51] PHILLIPS J C, HARDY D J, MAIA J D, et al. Scalable molecular dynamics on CPU and GPU architectures with NAMD[J]. The Journal of Chemical Physics, 2020, 153(4). [52] NVIDIA. NVIDIA Pascal Architecture Whitepaper.[EB/OL]. 2016. https://images.nvidia.com/content/pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf. [53] NVIDIA. NVIDIA Volta Architecture Whitepaper.[EB/OL]. 2017. https://images.nvidia.com/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf. [54] HO N M, WONG W F. Exploiting half precision arithmetic in Nvidia GPUs[C]//2017 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2017: 1-7. [55] TAM C K, WEBB J C. Dispersion-relation-preserving finite difference schemes for computational acoustics[J]. Journal of Computational Physics, 1993, 107(2): 262-281. [56] HIXON R. On increasing the accuracy of MacCormack schemes for aeroacoustic applications[C]//3rd AIAA/CEAS Aeroacoustics Conference. 1997: 1586. [57] 韦尔迪. 有限差分方法模拟地震波传播计算中自由表面边界条件的实现方法的研究[D].中国科学技术大学, 2018. [58] ZHANG W, SHEN Y. Unsplit complex frequency-shifted PML implementation using auxiliarydifferential equations for seismic wave modeling[J]. Geophysics, 2010, 75(4): T141-T154. [59] ZHANG Z, ZHANG W, CHEN X. Complex frequency-shifted multi-axial perfectly matched layer for elastic wave modelling on curvilinear grids[J]. Geophysical Journal International, 2014, 198(1): 140-153. [60] BAILEY D H. High-precision floating-point arithmetic in scientific computation[J]. Computing in Science & Engineering, 2005, 7(3): 54-61. [61] CHENG J, GROSSMAN M, MCKERCHER T. Professional CUDA c programming[M]. John Wiley & Sons, 2014. [62] WANG W, ZHANG Z, ZHANG W, et al. CGFDM3D-EQR: A platform for rapid response to earthquake disasters in 3D complex media[J]. Seismological Society of America, 2022, 93(4): 2320-2334. [63] MICIKEVICIUS P. 3D finite difference computation on GPUs using CUDA[C]//Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units. 2009: 79-84. [64] VIZITIU A, ITU L, NIŢĂ C, et al. Optimized three-dimensional stencil computation on Fermi and Kepler GPUs[C]//2014 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 2014: 1-6. [65] MARUYAMA N, AOKI T. Optimizing stencil computations for NVIDIA Kepler GPUs[C]//Proceedings of the 1st International Workshop on High-Performance Stencil Computations, Vi enna. Citeseer, 2014: 89-95. [66] CHOQUETTE J, GIROUX O, FOLEY D. Volta: Performance and programmability[J]. Ieee Micro, 2018, 38(2): 42-52. [67] FARSTAD M R. Understanding the key performance trends of optimized iterative stencil loop kernels on high-end gpus[D]. NTNU, 2021. [68] WANG W, ZHANG Z, ZHANG W, et al. CGFDM3D-EQR: A Platform for Rapid Response to Earthquake Disasters in 3D Complex Media[J]. Seismological Research Letters, 2022, 93:2320-2334. [69] KRISTEKOVÁ M, KRISTEK J, MOCZO P, et al. Misfit criteria for quantitative comparison of seismograms[J]. Bulletin of the Seismological Society of America, 2006, 96(5): 1836-1850. [70] KRISTEKOVÁ M, KRISTEK J, MOCZO P. Time-frequency misfit and goodness-of-fit criteria for quantitative comparison of time signals[J]. Geophysical Journal International, 2009, 178(2): 813-825. [71] BROUGOIS A, BOURGET M, LAILLY P, et al. Marmousi, model and data[C]//EAEGWorkshop-Practical Aspects of Seismic Data Inversion. European Association of Geoscientists & Engineers, 1990: cp-108. [72] MARTIN G S, WILEY R, MARFURT K J. Marmousi2: An elastic upgrade for Marmousi[J]. The Leading Edge, 2006, 25(2): 156-166. [73] WILLIAMS S, WATERMAN A, PATTERSON D. Roofline: an insightful visual performance model for multicore architectures[J]. Communications of the ACM, 2009, 52(4): 65-76.
所在学位评定分委会	力学
国内图书分类号	P315.3
来源库	人工提交
成果类型	学位论文
条目标识符	http://sustech.caswiz.com/handle/2SGJ60CL/765793
专题	南方科技大学理学院_地球与空间科学系
推荐引用方式 GB/T 7714	万家亮. 基于半精度浮点格式和GPU并行优化的三维地震模拟[D]. 深圳. 南方科技大学,2024.

条目包含的文件
文件名称/大小	文献类型	版本类型	开放类型	使用许可	操作
12132705-万家亮-地球与空间科学（38396KB）	--	--	限制开放	--	请求全文