GPGPU Acceleration

Research Overview

Research Leads:  Raj Singh, Programmer Analyst, UC San Diego and Dan Sandine, University of Illinois-Chicago

Quaternions are four dimensional Julia sets that can be rendered in 3D graphics space using a per pixel iterative computation technique [1]. This is an extreme case of a computationally intensive problem in floating point space where every pixel in rendered after a couple of hundred iterative calculations. It is also an embarrassingly parallel problem that ports well to various parallel architectures and where the rendering of a 4K animation of approximately 5000 frames takes over 20 days on a fast serial processor. There are also very little I/O and memory bandwidth requirements, which make this an ideal case for benchmarking different computing architectures.

This experiment was aimed at determining the work/watt numbers for GPU clusters and CPU clusters for pure computation jobs. We ported Dr. Daniel Sandin's Quaternion code to nVidia's GPU architecture using the CUDA toolkit. The original code is capable of running on multiple cores on a workstation and across a cluster of multi-core CPUs. The GPU version of the code was developed to function equivalently for a GPU cluster. The codes were run on our state-of-the-art 12-node CPU cluster and then on our state-of-the-art 12-node GPU cluster. The CPU cluster nodes have dual quad core Intel Xeon E5440 processors with 8GB of RAM per node and the GPU cluster nodes have dual GTX 295 nVidia cards with dual Intel Xeon E5440 processors as host and 8GB of RAM . Each GTX295 has two GPUs on it so the CPU cluster had 96 CPU cores and the GPU cluster had 48 GPUs.

Since we wanted to compare the energy efficiency of CPU clusters v/s GPU clusters, we utilized all the 96 CPU cores for our CPU runs and then all the 48 GPUs for the GPU part of the tests. We also wanted to evaluate the energy consumption of different storage options including local disks, NFS mounted cluster storage and network mounted fast storage over SunFire X4600 appliances. Results were also collected for animation frames of sizes 640x480, 1280x720 and 4096x2048 in order to take into account the affect of file systems on storage of small, medium and large files.  

Technical White Papers

GPGPU Acceleration 2010