龙格库塔间断有限元方法求解二维欧拉方程的多GPU加速实现
Accelerating the Runge-Kutta Discontinuous Galerking Method for Solving Two-Dimensional Flow on Multi GPUs
摘要:
为解决龙格库塔间断有限元方法(RKDG)求解流场耗时的问题,本文应用二维NACA0012翼型作为测试算例,使用多GPU加速求解。将流程网格按照GPU个数进行剖分,每个GPU计算一个网格区域。各计算节点上设置核函数的线程数等于流场网格数,节点间的数据通信使用MPI (Message Passing Interface)。通信过程中采用CUDA流和MPI非阻塞操作以覆盖数据的传输和计算,减少通信代价。结果表明,与CPU串行程序相比,1个、2个、4个GPU上分别获得了33倍、59倍和108倍的加速比。
Abstract:
It is time-consuming to use Runge-Kutta discontinuous Galerkin method to solver flow field. In this article, we use multi GPUs to accelerate computing of two-dimension NACA0012 airfoil. The flow mesh is divided into several blocks according to the number of GPUs. We specialize kernels with a one-element-per-thread strategy, and use MPI to communicate data among computing nodes. In the process of communication, we use CUDA stream and MPI non-blocking operation to overlap computation and communication. The result shows when compared with the serial CPU program, the speedup ratio of GPU code running on one, two, four GPUs is around 33, 59 and 108.
参考文献
|
[1]
|
Qiu, J.X., Khoo, B.C. and Shu, C.-W. (2006) A Numerical Study for the Performance of the Runge-Kutta Discontinuous Galerkin Method Based on Different Numerical Fluxes. Journal of Computational Physics, 212, 540-565. [Google Scholar] [CrossRef]
|
|
[2]
|
Sanders, J. and Kandrot, E. (2010) CUDA by Example: An Introduction to General-Purpose GPU Programming. Addison-Wesley Professional, .
|
|
[3]
|
Klöckner, A., Warburton, T., Bridge, J., et al. (2009) Nodal Discontinuous Galerkin Methods on Graphics Processors. Journal of Computational Physics, 228, 7863-7882. [Google Scholar] [CrossRef]
|
|
[4]
|
何晓峰, 程剑, 刘铁刚. 二维非结构网格上RKDG算法的CUDA解法器[C]//北京应用物理与计算数学研究所. 第十六届全国流体力学数值方法研讨会论文集: 2013年卷. 北京: CNKI, 2013.
|
|
[5]
|
Mu, D.W., Chen, P. and Wang, L.Q. (2013) Accelerating the Discontinuous Galerkin Method for Seismic Wave Propagation Simulations Using the Graphic Processing Unit (GPU)—Single-GPU Implementation. Computers and Geosciences, 51, 282-292. [Google Scholar] [CrossRef]
|
|
[6]
|
Cockburn, B. and Shu, C.W. (1991) The Runge-Kutta Local Projection $ P^1$-Discontinuous-Galerkin Finite Element Method for Scalar Conservation Laws. ESAIM: Mathematical Modelling and Numerical Analysis, 25, 337-361. [Google Scholar] [CrossRef]
|
|
[7]
|
NVIDIA: NVIDIA CUDA C Programming Guide, NVIDIA Corporation, May, 2011.
|
|
[8]
|
都志辉, 李三立, 审阅, 等. 高性能计算之并行编程技术——MPI并行程序设计[M]. 北京: 清华大学出版社, 2001.
|