Show simple item record

dc.contributor.authorAbdi, Daniel S.
dc.contributor.authorGiraldo, Francis X.
dc.contributor.authorConstantinescu, Emil M.
dc.contributor.authorCarr, Lester E., III
dc.contributor.authorWilcox, Lucas C.
dc.contributor.authorWarburton, Timothy C.
dc.date.accessioned2017-08-16T21:22:51Z
dc.date.available2017-08-16T21:22:51Z
dc.date.issued2017
dc.identifier.citationNot founden_US
dc.identifier.urihttps://hdl.handle.net/10945/55668
dc.descriptionThe article of record as published may be found at http://dx.doi.org/10.1177/ToBeAssigneden_US
dc.description.abstractWe present the acceleration of an IMplicit-EXplicit (IMEX) non-hydrostatic atmospheric model on manycore processors such as GPUs and Intel’s MIC architecture. IMEX time integration methods sidestep the constraint imposed by the Courant-Friedrichs-Lewy condition on explicit methods through corrective implicit solves within each time step. In this work, we implement and evaluate the performance of IMEX on manycore processors relative to explicit methods. Using 3D-IMEX at Courant number C=15 , we obtained a speedup of about 4X relative to an explicit time stepping method run with the maximum allowable C=1. Moreover, the unconditional stability of IMEX with respect to the fast waves means the speedup can increase significantly with the Courant number as long as the accuracy of the resulting solution is acceptable. We show a speedup of 100X at C=150 using 1D-IMEX to demonstrate this point. Several improvements on the IMEX procedure were necessary in order to outperform our results with explicit methods: a) reducing the number of degrees of freedom of the IMEX formulation by forming the Schur complement; b) formulating a horizontally-explicit vertically-implicit (HEVI) 1D-IMEX scheme that has a lower workload and potentially better scalability than 3D-IMEX; c) using high-order polynomial preconditioners to reduce the condition number of the resulting system; d) using a direct solver for the 1D-IMEX method by performing and storing LU factorizations once to obtain a constant cost for any Courant number. Without all of these improvements, explicit time integration methods turned out to be difficult to beat. We discuss in detail the IMEX infrastructure required for formulating and implementing efficient methods on manycore processors. Several parametric studies are conducted to demonstrate the gain from each of the above mentioned improvements. Finally, we validate our results with standard benchmark problems in numerical weather prediction and evaluate the performance and scalability of the IMEX method using up to 4192 GPUs and 16 Knights Landing processors.en_US
dc.description.sponsorshipDepartment of Energy (DoE)en_US
dc.description.sponsorshipOffice of Naval Research (ONR)en_US
dc.format.extent23 p.en_US
dc.publisherSageen_US
dc.rightsThis publication is a work of the U.S. Government as defined in Title 17, United States Code, Section 101. Copyright protection is not available for this work in the United States.en_US
dc.titleAcceleration of the implicit-explicit non-hydrostatic unified model of the atmosphere (NUMA) on Manycore processorsen_US
dc.typeArticleen_US
dc.contributor.corporateNaval Postgraduate School (U.S.)en_US
dc.contributor.departmentApplied Mathematicsen_US
dc.subject.authorIMEXen_US
dc.subject.authorNUMAen_US
dc.subject.authorGPUen_US
dc.subject.authorKNLen_US
dc.subject.authorManycoreen_US
dc.subject.authorHPCen_US
dc.subject.authorOCCAen_US
dc.subject.authorAtmospheric modelen_US
dc.subject.authorDiscontinuous Galerkinen_US
dc.subject.authorContinuous Galerkinen_US
dc.description.funderContract no. DE-AC05-00OR22725 (DoE)en_US
dc.description.funderPE-0602435N (ONR)en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record