Xing Cai
Xing Cai
Biography
Xing Cai received a PhD in Scientific Computing from the Department of Informatics at the University of Oslo in 1998. In 1999, he was appointed to the position of associate professor at the same university, and was promoted to full professorship in 2008. Dr. Cai joined Simula Research Laboratory at its very beginning in 2001, taking an 80% leave from his university position. He is now a Chief Research Scientist at Simula. Dr. Cai’s research interests include parallel programming and high-performance scientific computing on multi-core CPUs and GPUs, numerical methods for solving PDEs, and generic PDE software.
Heterogeneous Computing: Programming, Performance and Applications
Heterogeneous computing, loosely defined as drawing the computational power from more than one type of processors, is likely to become the norm of future HPC. The upcoming Exascale and beyond requires high energy efficiency, which cannot be delivered by the conventional CPUs. This thus pushes the rise of hardware accelerators that adopt simpler and more cores per device. Several new challenges, however, arise with using platforms where CPUs and accelerators co-exist.
First, programming of the accelerators can be very different from programming of the CPUs. Although there are user-friendly APIs (such as OpenCL/OpenACC/OpenMP) that allow a unified programming model for using heterogeneous systems, normally the best performance is obtained by combining specific programming models separately targeting the different processor types. Such hardware-specific and mixed code can be explicitly implemented by advanced programmers, or in some cases realized by automated code translators. Second, a larger gap is expected between actual and peak performance on the accelerators than on the CPUs. This requires more careful considerations when implementing scientific code for the accelerators. One particularly important issue is associated with unstructured computational meshes, for which different orders of traversing the mesh entities may lead to drastically different performance results on the accelerators. The other important and recurring issue is about strategically partitioning unstructured meshes when the CPUs and accelerators are to share the computational work. Another pressing issue due to heterogeneous computing is the increasingly visible cost of communication between the compute nodes. To achieve overlapping communication with computation becomes more difficult, which may even in certain cases render the overall approach counter-effective.
This talk will touch upon some research activities addressing the above issues of programming and performance, which have been carried out by researchers at Simula Research Laboratory and their collaborators. In addition to simple examples of 3D stencil computation and sparse matrix-vector multiplication associated with unstructured tetrahedral meshes, real-world applications of heterogenous CPU-GPU or CPU-Xeon Phi computing in geoscience and biomedical computing will also be presented.