Supercomputers are becoming even more supersized, due to the incorporation of enormous collections of lighter-weight processors into already-giant computing clusters. The top machines still run loosely-coupled parallel clients across the whole cluster; but now, each client in turn generates 100s of shared-memory threads that can exploit SIMD hardware, i.e., Single Instructions on Multiple Data streams. What's the reason for the change? Over the past decade, limits on power dissipation have pushed CPUs to grow in numbers rather than clock rate. This has driven HPC toward a model of "manycore" or GPU-like processors--resulting in software that has an ever-greater reliance on both multithreading and SIMD.
Such trends have obvious implications for application developers. Even on ordinary workstations, a computationally intense application must be able to take full advantage of multiple CPU cores and their SIMD (vector) units, as well as any attached accelerators or coprocessors. In this talk, we take an in-depth look at Intel's Xeon Phi coprocessor as a possible prototype for computing platforms of the future. We examine the Xeon Phi's hardware characteristics, then dive into an extended example of how software might need to be restructured in order to get the best performance from this type of hardware.
Our main case study is taken from high-energy physics, because planned upgrades at CERN's Large Hadron Collider will soon cause detector data to be produced at an unprecedented pace. New "manycore" techniques will be required to allow real-time processing to keep up. The existing software is based on the Kalman Filter method, which involves repetitious small-matrix operations that lack a natural SIMD formulation. Nevertheless, an abundance of parallel tasks is available, and we show how these tasks can be grouped in a way that is conducive to both multithreading and vectorization. We also describe the identification of a problem that would have prevented the application from scaling up to large numbers of threads.