Acceleration Techniques for Industrial Large Eddy Simulation with High-Order Methods on CPU-GPU Clusters
Issue Date
2021-05-31Author
Jourdan de Araujo Jorge Filho, Eduardo
Publisher
University of Kansas
Format
147 pages
Type
Dissertation
Degree Level
Ph.D.
Discipline
Aerospace Engineering
Rights
Copyright held by the author.
Metadata
Show full item recordAbstract
One of the NASA's 2030 CFD Vision document key finding is that the use of CFD in the aerospace design process is severely limited by the inability to accurately and reliably predict turbulent flows with significant regions of separation. Scale-resolving simulations such as large eddy simulation (LES) are increasingly utilized with more complex problems such as flow over high lift configurations and through aircraft engines. The present work has the overall objective of reducing the computational cost of industrial LES. The high-order flux reconstruction (FR) method is used as the spatial discretization scheme. First, two acceleration techniques are investigated: the p-multigrid algorithm and Mach number preconditioning. The Weiss and Smith low Mach number preconditioner is used together with the p-multigrid method, and the third order explicit Runge-Kutta (RK3) scheme is considered as the smoother to reduce memory requirements. Mach number preconditioning significantly increased the efficiency of the p-multigrid method. For unsteady simulations, the preconditioner helped with the efficiency of the p-multigrid with larger physical time steps. In most steady cases, the preconditioned p-multigrid approach is comparable to or faster than the implicit LU-SGS algorithm and requires less memory, specially for p 2 schemes. An efficient implementation of the FR method is done for modern GPU clusters and the speedup is investigated for different polynomial orders and cell types. Approaches to improve the parallel efficiency of multi-GPU simulations are also studied. The simulation node-hour cost on the Summit supercomputer is reduced by a factor of 50 for hexahedron cells and up to 200 for tetrahedron cells. Two low memory implicit time integration methods are implemented on GPUs: the matrix-free GMRES solver and a novel local GMRES-SGS method. Parametric studies are done to evaluate their performance on LES benchmark cases. On the High-Lift Common Research Model case for the 2021 4th AIAA High-Lift Prediction Workshop, both GPU implicit time methods provide an additional speedup of 14 and 68, respectively, over the GPU explicit time simulation.
Collections
- Dissertations [4454]
Items in KU ScholarWorks are protected by copyright, with all rights reserved, unless otherwise indicated.
We want to hear from you! Please share your stories about how Open Access to this item benefits YOU.