Thorny Flat (2019-)¶

Thorny Flat is current WVU’s general purpose HPC Cluster. It was deployed in February 2019 and funded in large part by NSF Major Research Instrumentation (MRI) Grant Award #1726534. The cluster has a total of 6196 CPU cores spread over 170 nodes using a shared Intel Omnipath 100Gbps Interconnect. The system is a heterogeneous cluster in which there are different node types. In addition, each year a new addition is added to the cluster, which is known as a new phase. The cluster has five phases and currently is in phase 2.

Overview¶

(This text can be used for proposals to Grant Funding Agencies)

Thorny Flat is a general-purpose High-Performance Computing (HPC) cluster. Thorny Flat serves the HPC needs of West Virginia University (WVU) and other higher education institutions in West Virginia. It is hosted in Pittsburgh Supercomputer Center and was built thanks to NSF Major Research Instrumentation (MRI) Grant Award #1726534

Thorny Flat is a cluster of 178 compute nodes plus 4 management nodes. The total number of CPU cores in compute nodes is 6516. The distribution of CPU cores is as follows: 140 compute nodes with a dual-socket Intel(R) Xeon(R) Gold 6138 or 6230 (40 cores per node). 7 compute nodes with dual-socket Intel(R) Xeon(R) Gold 6126 (24 cores per node) 27 compute nodes with dual-socket Intel(R) Xeon(R) Silver 4210 (20 cores per node). 4 compute nodes with dual-socket Intel(R) Xeon(R) Gold 6126 (52 cores per node). Memory on compute nodes ranges from 96 GB to 768 GB. The machines are interconnected using Intel(R) Omnipath(R) 100 Gbps with a blocking ratio of 5:1.

Thorny Flat has 11 compute nodes with hardware accelerators in the form of NVIDIA GPUs. There is a total of 47 GPU cards distributed as follows: 7 compute nodes with 3 GPUs NVIDIA(R) Quadro P6000 24GB PCIe GPUs. 3 compute nodes with 8 GPUs NVIDIA(R) Quadro RTX 6000 24GB PCIe GPUs. 1 compute node with 2 GPUs NVIDIA(R) A100 40GB PCIe GPUs.

Thorny Flat scored 115 TeraFLOPS using 101 CPU-only compute nodes. Score was measured from the HPL Linpack benchmark.

Thorny Flat uses SLURM for workload management and it has a variety of compilers, numerical libraries and scientific software specifically compiled and optimized for the hardware architecture.

Acknowledgment Message¶

We ask our users to acknowledge the use of Thorny Flat in all possible publications thanks to this resource. The message in the acknowledgment section could be as follows:

Computational resources were provided by the WVU Research Computing Thorny Flat HPC cluster, which is funded in part by NSF OAC-1726534.

Total Compute Resources¶

Aggregated numbers:

Total number of Compute nodes: 178
Total number of CPU cores: 6516
Total RAM: 29.4 TB

General description of compute nodes:

167 CPU-only Compute nodes.
7 Hardware-accelerated compute nodes with 3 NVIDIA P6000 (21 GPU cards).
3 Hardware-accelerated compute nodes with 8 NVIDIA RTX6000 (24 GPU cards).
1 Hardware-accelerated compute node with 2 NVIDIA A100.
4 Management Nodes.

Shared Interconnect¶

All the nodes in Thorny Flat are interconnected using Intel(R) Omnipath(R) 100 Gbps with switches in a blocking ratio of 5:1

Resource manager and system scheduler¶

Thorny Flat uses SLURM as workload manager version 22.05.6.

Hardware¶

Phase 0/1 Hardware¶

Intel® Xeon® Gold 6138 Processor

Intel® Xeon® Gold 6126 Processor

Node Type	Description	Community Nodes	Condo Nodes	Total
Small Memory	2 x Intel® Xeon® Gold 6138 Processor (20 cores/cpu) 96GB memory 240GB SSD 100 Gb Omnipath 5 yr warranty	64	13	77
Medium Memory	2 x Intel® Xeon® Gold 6138 Processor (20 cores/cpu) 192GB memory 240GB SSD 100 Gb Omnipath 5 yr warranty	0	16	16
Large Memory	2 x Intel® Xeon® Gold 6138 Processor (20 cores/cpu) 384GB memory 240GB SSD 100 Gb Omnipath 5 yr warranty	0	4	4
XL Memory	2 x Intel® Xeon® Gold 6138 Processor (20 cores/cpu) 768GB memory 240GB SSD 100 Gb Omnipath 5 yr warranty	3	1	4
GPU	2 x Intel® Xeon® Gold 6126 Processor (12 cores/cpu) 3 x NVIDIA Quadro P6000 24GB PCIe GPUs, 96GB memory 240GB SSD 100 Gb Omnipath 5 yr warranty	6	1	7

Partitions¶

The current state and limits of partitions can be found using the qstat command.

server: trcis002.hpc.wvu.edu

Partition            Memory CPU Time Walltime Node  Run Que Lm  State
---------------- ------ -------- -------- ----  --- --- --  -----
standby            --      --    04:00:00   --    0   0 --   E R
comm_small_week    --      --    168:00:0   --    0   0 --   E R
comm_small_day     --      --    24:00:00   --    0   0 --   E R
comm_gpu_week      --      --    168:00:0   --    0   0 --   E R
comm_xl_week       --      --    168:00:0   --    0   0 --   E R
                                           ----- -----
                                                  0     0

There are three main partition types - research team partitions, the standby partition, and community node partitions.

Research Team Partitions¶

Research teams that have bought their own compute nodes have private partitions that link all their compute nodes together. Only users given permission from the research team’s buyer (Usually the labs PI) will have permission to directly submit jobs to these partitions. While these are private partitions - unused resources/compute nodes from these partitions will be available to the standby partition (see below). However, per the system-wide policies, all research team’s compute nodes must be available to the research team’s users within 4 hours of job submission. By default, these partitions are regulated by first come, first serve queuing. However, individual research teams can ask for different settings for their respective partition, and should contact the RC HPC team with these requests.

Standby Partition¶

The standy partition is for using resources from research teams partitions that are not currently being used. Priority on the standby partition is set by fair share queuing. This means that user priority is assigned based on a combination of the size of the job and how much system resources the user have used during the given week, with higher priority assigned to larger jobs and/or user jobs that have used fewer system resources in the week. Further, the standby partition has a 4-hour wall time.

Community Node Partitions¶

Thorny Flat has several partitions that start with the word ‘comm’. These partitions are linked to the 73 compute/GPU nodes bought using NSF funding sources, and as such open for Statewide Higher Education use. These partitions are separated by node type (i.e. extra large memory, and GPU) and can be used by all users. Currently, these nodes are regulated by fair share queuing. This means that user priority is assigned based on a combination of the size of the job and how much system resources the user has used during the given week, with higher priority assigned to larger jobs and/or user jobs that have used less system resources in the week. Further, all community partitions have a week wall time, except for the (comm_small_day). comm_small_day allows jobs up 24 hours; and, this partition class has access to a larger number of resources than than comm_small_week). These restrictions are set to prevent a single user from occupying a large number of the community resources for an excessively long time.

Hardware acceleration¶

Thorny Flat has 11 compute nodes with hardware accelerators in the form of NVIDIA GPU cards. The GPUs present on Thorny Flat are NVIDIA P6000, NVIDIA QUADRO RTX6000 and NVIDIA A100 The following table describes the distribution of accelerators in the GPU compute nodes.

Node Name	Description	Quadro P6000	Quadro RTX 6000	A100
tcogq001	2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz Total RAM: 96GB	3	0	0
tcogq002	2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz Total RAM: 96GB	3	0	0
tcogq003	2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz Total RAM: 96GB	3	0	0
tcogq004	2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz Total RAM: 96GB	3	0	0
tcogq005	2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz Total RAM: 96GB	3	0	0
tcogq006	2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz Total RAM: 96GB	3	0	0
tbmgq001	2x Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz Total RAM: 96GB	3	0	0
tbmgq100	2x Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz Total RAM: 192GB	0	8	0
tbegq201	2x Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz Total RAM: 192GB	0	8	0
tbegq202	2x Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz Total RAM: 192GB	0	8	0
tbegq200	2x Intel(R) Xeon(R) Gold 6230R CPU @ 2.10GHz Total RAM: 192GB	0	0	2
TOTAL	CPU: 7 x 24 cores + 4 x 52 cores = 376 cores RAM: 7 x 96GB + 4 x 192GB = 1440 GB	21	24	2

The specifications of the three kinds of GPU cards on Thorny Flat are shown in the table below

GPU Card Model	GPU Memory	CUDA Cores	Tensor Cores	Max Power Compsumption	Compute Capability
Quadro P6000	24 GB GDDR5X	3840		250 W	6.1
Quadro RTX 6000	24 GB GDDR6	4608	576	250 W	7.5
A100-PCIE-40GB	40 GB HBM2	6912 FP32 3456 FP64	432	250 W	8.0

Full specifications for the GPU cards can be found for Quadro P6000 , Quadro RTX 6000 and NVIDIA A100

The GPUs in Thorny Flat have different compute capabilities. The compute capability of a device is represented by a version number, also sometimes called its “SM version”. This version number identifies the features supported by the GPU hardware and is used by applications at runtime to determine which hardware features and/or instructions are available on the present GPU.

The compute capability comprises a major revision number X and a minor revision number Y and is denoted by X.Y.

Devices with the same major revision number are of the same core architecture. The major revision number is 8 for devices based on the NVIDIA Ampere GPU architecture (like A100), 7 for devices based on the Volta architecture (like the Quadro RTX 6000), and 6 for devices based on the Pascal architecture (like the Quadro P6000).

You can see Compute Capabilities for other GPU cards.