Thorny Flat (2019-)

Thorny Flat is current WVU’s general purpose HPC Cluster. It was deployed in February 2019 and funded in large part by NSF Major Research Instrumentation (MRI) Grant Award #1726534. The cluster has a total of 6196 CPU cores spread over 170 nodes using a shared Intel Omnipath 100Gbps Interconnect. The system is a heterogeneous cluster in which there are different node types. In addition, each year a new addition is added to the cluster, which is known as a new phase. The cluster has five phases and currently is in phase 2.

Thorny Flat

Overview

(This text can be used for proposals to Grant Funding Agencies)

Thorny Flat is a general-purpose High-Performance Computing (HPC) cluster. Thorny Flat serves the HPC needs of West Virginia University (WVU) and other higher education institutions in West Virginia. It is hosted in Pittsburgh Supercomputer Center and was built thanks to NSF Major Research Instrumentation (MRI) Grant Award #1726534

Thorny Flat is a cluster of 178 compute nodes plus 4 management nodes. The total number of CPU cores in compute nodes is 6516. The distribution of CPU cores is as follows: 140 compute nodes with a dual-socket Intel(R) Xeon(R) Gold 6138 or 6230 (40 cores per node). 7 compute nodes with dual-socket Intel(R) Xeon(R) Gold 6126 (24 cores per node) 27 compute nodes with dual-socket Intel(R) Xeon(R) Silver 4210 (20 cores per node). 4 compute nodes with dual-socket Intel(R) Xeon(R) Gold 6126 (52 cores per node). Memory on compute nodes ranges from 96 GB to 768 GB. The machines are interconnected using Intel(R) Omnipath(R) 100 Gbps with a blocking ratio of 5:1.

Thorny Flat has 11 compute nodes with hardware accelerators in the form of NVIDIA GPUs. There is a total of 47 GPU cards distributed as follows: 7 compute nodes with 3 GPUs NVIDIA(R) Quadro P6000 24GB PCIe GPUs. 3 compute nodes with 8 GPUs NVIDIA(R) Quadro RTX 6000 24GB PCIe GPUs. 1 compute node with 2 GPUs NVIDIA(R) A100 40GB PCIe GPUs.

Thorny Flat scored 115 TeraFLOPS using 101 CPU-only compute nodes. Score was measured from the HPL Linpack benchmark.

Thorny Flat uses SLURM for workload management and it has a variety of compilers, numerical libraries and scientific software specifically compiled and optimized for the hardware architecture.

Acknowledgment Message

We ask our users to acknowledge the use of Thorny Flat in all possible publications thanks to this resource. The message in the acknowledgment section could be as follows:

Computational resources were provided by the WVU Research Computing Thorny Flat HPC cluster, which is funded in part by NSF OAC-1726534.

Total Compute Resources

Aggregated numbers:

  • Total number of Compute nodes: 178

  • Total number of CPU cores: 6516

  • Total RAM: 29.4 TB

General description of compute nodes:

  • 167 CPU-only Compute nodes.

  • 7 Hardware-accelerated compute nodes with 3 NVIDIA P6000 (21 GPU cards).

  • 3 Hardware-accelerated compute nodes with 8 NVIDIA RTX6000 (24 GPU cards).

  • 1 Hardware-accelerated compute node with 2 NVIDIA A100.

  • 4 Management Nodes.

Shared Interconnect

All the nodes in Thorny Flat are interconnected using Intel(R) Omnipath(R) 100 Gbps with switches in a blocking ratio of 5:1

Resource manager and system scheduler

Thorny Flat uses SLURM as workload manager version 22.05.6.

Hardware

Phase 0/1 Hardware

Node Type

Description

Community
Nodes
Condo
Nodes

Total

Small Memory

  • 2 x Intel® Xeon® Gold 6138 Processor (20 cores/cpu)

  • 96GB memory

  • 240GB SSD

  • 100 Gb Omnipath

  • 5 yr warranty

64

13

77

Medium Memory

  • 2 x Intel® Xeon® Gold 6138 Processor (20 cores/cpu)

  • 192GB memory

  • 240GB SSD

  • 100 Gb Omnipath

  • 5 yr warranty

0

16

16

Large Memory

  • 2 x Intel® Xeon® Gold 6138 Processor (20 cores/cpu)

  • 384GB memory

  • 240GB SSD

  • 100 Gb Omnipath

  • 5 yr warranty

0

4

4

XL Memory

  • 2 x Intel® Xeon® Gold 6138 Processor (20 cores/cpu)

  • 768GB memory

  • 240GB SSD

  • 100 Gb Omnipath

  • 5 yr warranty

3

1

4

GPU

  • 2 x Intel® Xeon® Gold 6126 Processor (12 cores/cpu)

  • 3 x NVIDIA Quadro P6000 24GB PCIe GPUs,

  • 96GB memory

  • 240GB SSD

  • 100 Gb Omnipath

  • 5 yr warranty

6

1

7

Partitions

The current state and limits of partitions can be found using the qstat command.

server: trcis002.hpc.wvu.edu

Partition            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
standby            --      --    04:00:00   --    0   0 --   E R
comm_small_week    --      --    168:00:0   --    0   0 --   E R
comm_small_day     --      --    24:00:00   --    0   0 --   E R
comm_gpu_week      --      --    168:00:0   --    0   0 --   E R
comm_xl_week       --      --    168:00:0   --    0   0 --   E R
                                           ----- -----
                                                  0     0

There are three main partition types - research team partitions, the standby partition, and community node partitions.

Research Team Partitions

Research teams that have bought their own compute nodes have private partitions that link all their compute nodes together. Only users given permission from the research team’s buyer (Usually the labs PI) will have permission to directly submit jobs to these partitions. While these are private partitions - unused resources/compute nodes from these partitions will be available to the standby partition (see below). However, per the system-wide policies, all research team’s compute nodes must be available to the research team’s users within 4 hours of job submission. By default, these partitions are regulated by first come, first serve queuing. However, individual research teams can ask for different settings for their respective partition, and should contact the RC HPC team with these requests.

Standby Partition

The standy partition is for using resources from research teams partitions that are not currently being used. Priority on the standby partition is set by fair share queuing. This means that user priority is assigned based on a combination of the size of the job and how much system resources the user have used during the given week, with higher priority assigned to larger jobs and/or user jobs that have used fewer system resources in the week. Further, the standby partition has a 4-hour wall time.

Community Node Partitions

Thorny Flat has several partitions that start with the word ‘comm’. These partitions are linked to the 73 compute/GPU nodes bought using NSF funding sources, and as such open for Statewide Higher Education use. These partitions are separated by node type (i.e. extra large memory, and GPU) and can be used by all users. Currently, these nodes are regulated by fair share queuing. This means that user priority is assigned based on a combination of the size of the job and how much system resources the user has used during the given week, with higher priority assigned to larger jobs and/or user jobs that have used less system resources in the week. Further, all community partitions have a week wall time, except for the (comm_small_day). comm_small_day allows jobs up 24 hours; and, this partition class has access to a larger number of resources than than comm_small_week). These restrictions are set to prevent a single user from occupying a large number of the community resources for an excessively long time.

Hardware acceleration

Thorny Flat has 11 compute nodes with hardware accelerators in the form of NVIDIA GPU cards. The GPUs present on Thorny Flat are NVIDIA P6000, NVIDIA QUADRO RTX6000 and NVIDIA A100 The following table describes the distribution of accelerators in the GPU compute nodes.

Node Name

Description

Quadro
P6000
Quadro
RTX 6000

A100

tcogq001

2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Total RAM: 96GB

3

0

0

tcogq002

2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Total RAM: 96GB

3

0

0

tcogq003

2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Total RAM: 96GB

3

0

0

tcogq004

2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Total RAM: 96GB

3

0

0

tcogq005

2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Total RAM: 96GB

3

0

0

tcogq006

2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Total RAM: 96GB

3

0

0

tbmgq001

2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Total RAM: 96GB

3

0

0

tbmgq100

2x Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
Total RAM: 192GB

0

8

0

tbegq201

2x Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
Total RAM: 192GB

0

8

0

tbegq202

2x Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
Total RAM: 192GB

0

8

0

tbegq200

2x Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz
Total RAM: 192GB

0

0

2

TOTAL

CPU: 7 x 24 cores + 4 x 52 cores = 376 cores
RAM: 7 x 96GB + 4 x 192GB = 1440 GB

21

24

2

The specifications of the three kinds of GPU cards on Thorny Flat are shown in the table below

GPU Card
Model
GPU
Memory
CUDA
Cores
Tensor
Cores
Max Power
Compsumption
Compute
Capability

Quadro P6000

24 GB GDDR5X

3840

250 W

6.1

Quadro RTX 6000

24 GB GDDR6

4608

576

250 W

7.5

A100-PCIE-40GB

40 GB HBM2

6912 FP32
3456 FP64

432

250 W

8.0

Full specifications for the GPU cards can be found for Quadro P6000 , Quadro RTX 6000 and NVIDIA A100

The GPUs in Thorny Flat have different compute capabilities. The compute capability of a device is represented by a version number, also sometimes called its “SM version”. This version number identifies the features supported by the GPU hardware and is used by applications at runtime to determine which hardware features and/or instructions are available on the present GPU.

The compute capability comprises a major revision number X and a minor revision number Y and is denoted by X.Y.

Devices with the same major revision number are of the same core architecture. The major revision number is 8 for devices based on the NVIDIA Ampere GPU architecture (like A100), 7 for devices based on the Volta architecture (like the Quadro RTX 6000), and 6 for devices based on the Pascal architecture (like the Quadro P6000).

You can see Compute Capabilities for other GPU cards.