GPU Computing¶

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel.

As GPUs become more powerful they start being use to perform numerical calculations in what is called a general purpose graphics processing unit (GPGPU). Due to their design GPUs are generally suited to high-throughput type computations that exhibit data-parallelism. One typical case is in Single Operations Multiple Data (SIMD) operations.

Today GPU computing is an integral part of an HPC environment. Three of the 10 most powerful supercomputers in the world today (2019) take advantage of GPU acceleration.

Both of our clusters, Spruce Knob and Thorny Flat have some nodes with GPUs for jobs that can take advantage of them.

Hardware configurations¶

The GPUs on Spruce Knob are from the Kepler GPU microarchitecture (K20 GPU accelerator). On Thorny Flat the GPUs are all from the Pascal GPU microarchitecture (NVIDIA QUADRO P6000)

Spruce Knob¶

Spruce has 2 community nodes with GPUs. One has 1x NVIDIA K20m and the second one has 3x NVIDIA K20m.

NVIDIA K20	Specifications
Launch	November 12, 2012
Chips	1x GK110
NVIDIA CUDA Cores	2496
Base Clock	706
Max Boost clock	758
Bus type	GDDR5
Bus width	320-bit
GPU Memory Size	5 GB
Clock (MT/s)	5200
Memory Bandwidth	208 (GB/s)
Single Precision (MAD or FMA)	3524
Double Precision (FMA)	1175
Cuda compute ability	3.5
Thermal Design Power (TDP)	225 W
Form Factor	Full Height, Dual Slot

Thorny Flat¶

Thorny has 6 community nodes with 3x NVIDIA P6000 per node.

NVIDIA QUADRO P6000	Specifications
GPU Memory Size	24 GB GDDR5X
Memory Interface	384-bit
Memory Bandwidth	432 GB/s
NVIDIA CUDA Cores	3840
System Interface	PCI Express 3.0 x16
Max Power Consumption	250 W
Thermal Solution	Active
Form Factor	Dual Slot, Full Height
Compute APIs	CUDA, OpenCL

Interactive computing with GPUs¶

On Spruce we have one queue for GPU computing. In the case of interacive jobs execute:

qsub -I -q comm_gpu -l nodes=1:ppn=1:gpus=1

On Thorny Flat we have separate queues for interactive and non-interactive jobs. In the case of interactive jobs, execute:

qsub -I -q comm_gpu_inter -l nodes=1:ppn=1:gpus=3

Checking the presence of GPUs¶

The command nvidia-smi can be used to monitor the presence and status of the GPUs on the current compute node. For example on Spruce the command will return something like:

$> nvidia-smi
Tue Sep 24 15:48:45 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          Off  | 00000000:08:00.0 Off |                    0 |
| N/A   40C    P0    73W / 225W |     96MiB /  4743MiB |     44%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          Off  | 00000000:24:00.0 Off |                    0 |
| N/A   40C    P0    49W / 225W |      0MiB /  4743MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20m          Off  | 00000000:27:00.0 Off |                    0 |
| N/A   34C    P0    52W / 225W |      0MiB /  4743MiB |     91%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|                                                                             |
+-----------------------------------------------------------------------------+

On Thorny Flat, the same command returns something like:

$> nvidia-smi
Tue Sep 24 15:56:16 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P6000        Off  | 00000000:37:00.0 Off |                  Off |
| 26%   53C    P0   176W / 250W |    523MiB / 24449MiB |     94%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  Quadro P6000        Off  | 00000000:AF:00.0 Off |                  Off |
| 26%   53C    P0   152W / 250W |    527MiB / 24449MiB |     94%   E. Process |
+-------------------------------+----------------------+----------------------+
|   2  Quadro P6000        Off  | 00000000:D8:00.0 Off |                  Off |
| 26%   50C    P0   162W / 250W |    523MiB / 24449MiB |     96%   E. Process |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|                                                                             |
+-----------------------------------------------------------------------------+