GPU Computing

A graphics processing unit (GPU) is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. Their highly parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel.

As GPUs become more powerful they start being use to perform numerical calculations in what is called a general purpose graphics processing unit (GPGPU). Due to their design GPUs are generally suited to high-throughput type computations that exhibit data-parallelism. One typical case is in Single Operations Multiple Data (SIMD) operations.

Today GPU computing is an integral part of an HPC environment. Three of the 10 most powerful supercomputers in the world today (2019) take advantage of GPU acceleration.

Both of our clusters, Spruce Knob and Thorny Flat have some nodes with GPUs for jobs that can take advantage of them.

Hardware configurations

The GPUs on Spruce Knob are from the Kepler GPU microarchitecture (K20 GPU accelerator). On Thorny Flat the GPUs are all from the Pascal GPU microarchitecture (NVIDIA QUADRO P6000)

Spruce Knob

Spruce has 2 community nodes with GPUs. One has 1x NVIDIA K20m and the second one has 3x NVIDIA K20m.

NVIDIA K20

Specifications

Launch

November 12, 2012

Chips

1x GK110

NVIDIA CUDA Cores

2496

Base Clock

706

Max Boost clock

758

Bus type

GDDR5

Bus width

320-bit

GPU Memory Size

5 GB

Clock (MT/s)

5200

Memory Bandwidth

208 (GB/s)

Single Precision (MAD or FMA)

3524

Double Precision (FMA)

1175

Cuda compute ability

3.5

Thermal Design Power (TDP)

225 W

Form Factor

Full Height, Dual Slot

Thorny Flat

Thorny has 6 community nodes with 3x NVIDIA P6000 per node.

NVIDIA QUADRO P6000

Specifications

GPU Memory Size

24 GB GDDR5X

Memory Interface

384-bit

Memory Bandwidth

432 GB/s

NVIDIA CUDA Cores

3840

System Interface

PCI Express 3.0 x16

Max Power Consumption

250 W

Thermal Solution

Active

Form Factor

Dual Slot, Full Height

Compute APIs

CUDA, OpenCL

Interactive computing with GPUs

On Spruce we have one queue for GPU computing. In the case of interacive jobs execute:

qsub -I -q comm_gpu -l nodes=1:ppn=1:gpus=1

On Thorny Flat we have separate queues for interactive and non-interactive jobs. In the case of interactive jobs, execute:

qsub -I -q comm_gpu_inter -l nodes=1:ppn=1:gpus=3

Checking the presence of GPUs

The command nvidia-smi can be used to monitor the presence and status of the GPUs on the current compute node. For example on Spruce the command will return something like:

$> nvidia-smi
Tue Sep 24 15:48:45 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.26                 Driver Version: 396.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla K20m          Off  | 00000000:08:00.0 Off |                    0 |
| N/A   40C    P0    73W / 225W |     96MiB /  4743MiB |     44%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla K20m          Off  | 00000000:24:00.0 Off |                    0 |
| N/A   40C    P0    49W / 225W |      0MiB /  4743MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla K20m          Off  | 00000000:27:00.0 Off |                    0 |
| N/A   34C    P0    52W / 225W |      0MiB /  4743MiB |     91%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|                                                                             |
+-----------------------------------------------------------------------------+

On Thorny Flat, the same command returns something like:

$> nvidia-smi
Tue Sep 24 15:56:16 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48                 Driver Version: 410.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro P6000        Off  | 00000000:37:00.0 Off |                  Off |
| 26%   53C    P0   176W / 250W |    523MiB / 24449MiB |     94%   E. Process |
+-------------------------------+----------------------+----------------------+
|   1  Quadro P6000        Off  | 00000000:AF:00.0 Off |                  Off |
| 26%   53C    P0   152W / 250W |    527MiB / 24449MiB |     94%   E. Process |
+-------------------------------+----------------------+----------------------+
|   2  Quadro P6000        Off  | 00000000:D8:00.0 Off |                  Off |
| 26%   50C    P0   162W / 250W |    523MiB / 24449MiB |     96%   E. Process |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|                                                                             |
+-----------------------------------------------------------------------------+