CUDA

CUDA is a parallel computing platform and programming model from Nvidia. The Nvidia Kepler K20X accelerators in the XK nodes support CUDA compute capability 3.5.

How to use CUDA

To use CUDA tools on Blue Waters, load the cudatoolkit module:

module load cudatoolkit

CUDA C code may be compiled with nvcc. Loading the cudatoolkit module will add nvcc to your PATH.

Helpful environment variables have been provided by Cray to assist with building CUDA code when using the Cray or PGI programming environment with MPI application Makefile or configure scripts. Look here if you need to manually set Include or Library paths for your build:

echo $CRAY_CUDATOOLKIT_INCLUDE_OPTS
echo $CRAY_CUDATOOLKIT_POST_LINK_OPTS

To learn which environment variables are defined by the cudatoolkit module, type the following command

module show cudatoolkit

CUDA FORTRAN

PGI Programming Environment supports CUDA FORTRAN language. To build CUDA FORTRAN code, load the PGI programming environment:

module swap PrgEnv-cray PrgEnv-pgi
module load cudatoolkit

Building sample code:

wget http://www.pgroup.com/lit/samples/matmul.CUF
ftn -o matmul.x matmul.CUF

Running the test can be accomplished with help of the following PBS script:

#PBS -l nodes=1:ppn=16:xk
#PBS -l walltime=0:05:00
cd $PBS_O_WORKDIR
aprun -n1 ./matmul.x > job.out

CUDA with CMake

CMake does not configure CUDA correctly on Blue Waters. To fix this problem, add -DCUDA_HOST_COMPILER=$(which CC) to your CMake options.

libsci_acc

Through Cray libsci_acc, BLAS, LAPACK, and ScaLAPACK routines are provided to improve performance by generating and running automatically-tuned accelerator kernels on the XK nodes when appropriate. Use it with PrgEnv-cray or PrgEnv-gnu, by adding the module:

module load craype-accel-nvidia35 # <-- automatically includes libsci_acc

aprun -cc none -n <numranks> ... # Cray recommends allowing threads to migrate within a node when using libsci_acc

Nsight

Nvidia's Nsight eclipse-based integrated development environment (IDE) is installed and available as:

$CUDATOOLKIT_HOME/libnsight/nsight

Caveat: being eclipse-based, the GUI has a lot of widgets and proximity to our LAN will yield best performance. If your ping time to the system is over 50ms, you may find the tool difficult to use. In that case, it may be worth installing your own local version of CUDA if you want to use Nsight with your kernel development.

`Example code`

The following src code is from Nvidia's cudasamples tar bundle and is used to demonstrate techniques for compiling a basic MPI program with CUDA code. The first example would work with cudatoolkit and PrgEnv-cray or PrgEnv-pgi.

simpleMPI.h , simpleMPI.cpp , simpleMPI.cu

nvcc -c -gencode=arch=compute_35,code=compute_35 -o \

simpleMPIcuda.o simpleMPI.cu

CC -o simpleMPI simpleMPI.cpp simpleMPIcuda.o

The next example demonstrates compiling with PrgEnv-gnu and an earlier gcc version (gcc/4.6.3) that is compatible with nvcc. Done this way, the MPI headers and libraries are linked by the Cray CC wrapper on the nvcc command line. The t.cu file is the combined source from simpleMPI.cu and simpleMPI.cpp.

nvcc -gencode=arch=compute_35,code=compute_35 -o gnusimpleMPI \

--compiler-bindir `which CC` t.cu