Dolly Sods (2023-)
==================

Dolly Sods is WVU's latest HPC Cluster. It was deployed in July/August 2023 and was funded by `NSF Major Research Instrumentation (MRI) Grant Award #2117575 <https://www.nsf.gov/awardsearch/showAward?AWD_ID=2117575>`_. 
The cluster has a total of 37 nodes. 
The system has 155 NVIDIA GPU cards. 
They are distributed as 120 NVIDIA A30, 19 NVIDIA A40, and 16 NVIDIA A100.

.. |logo1| image:: /_static/Dolly_Sods_1.jpg  
   :scale: 6%
   :align: top
.. |logo2| image:: /_static/Dolly_Sods_2.jpg
   :scale: 6%
   :align: top
.. |logo3| image:: /_static/Dolly_Sods_3.jpg
   :scale: 13%
   :align: top

.. list-table:: Dolly Sods
   :widths: 62 38
   :header-rows: 0

   * - |logo3|
     - | |logo1| 
       | 
       | |logo2|

Overview
--------

*(This text can be used for proposals to Grant Funding Agencies)*

Dolly Sods is a hardware-accelerated High-Performance Computing (HPC) cluster. Dolly Sods serves the HPC needs of West Virginia University (WVU) and other higher education institutions in West Virginia. It is hosted in Morgantown, WV, and was built thanks to NSF Major Research Instrumentation (MRI) Grant Award #2117575

Dolly Sods is a cluster of 37 compute nodes.
All compute nodes have hardware accelerators in the form of NVIDIA GPUs. 
There is a total of 155 GPU cards distributed as follows: 
30 compute nodes with 4 GPUs NVIDIA(R) A30 24GB PCIe. 
4 compute nodes with 4 GPUs NVIDIA(R) A40 24GB PCIe. 
2 compute nodes with 8 GPUs NVIDIA(R) A100 40GB PCIe.
The login node also has 3 GPUs NVIDIA(R) A40 24GB PCIe. 

Dolly Sods uses SLURM for workload management and it has a variety of compilers, numerical libraries, and scientific software specifically compiled and optimized for the hardware architecture.

Acknowledgment Message
----------------------

We ask our users to acknowledge the use of Dolly Sods in all possible publications thanks to this resource. The message in the acknowledgment section could be as follows:

  *Computational resources were provided by the WVU Research Computing Dolly Sods HPC cluster, which is funded in part by NSF OAC-2117575.*


Hardware Acceleration
---------------------

The specifications of the three kinds of GPU cards on Dolly Sods are shown in the table below

+-----------------+-----------------+--------------+----------+----------------+---------------+
| | GPU Card      | | GPU           | | CUDA       | | Tensor | | Max Power    | | Compute     |
| | Model         | | Memory        | | Cores      | | Cores  | | Compsumption | | Capability  |
+=================+=================+==============+==========+================+===============+
| NVIDIA A30      | 24 GB HBM2      | 3804         | 224      | 165 W          | 8.0           |
+-----------------+-----------------+--------------+----------+----------------+---------------+
| NVIDIA A40      | 48 GB GDDR6 ECC | 10752        | 336      | 300 W          | 8.6           |
+-----------------+-----------------+--------------+----------+----------------+---------------+
| NVIDIA A100     | 40 GB HBM2e ECC | | 6912 FP32  | 432      | 250 W          | 8.0           |
|                 |                 | | 3456 FP64  |          |                |               |
+-----------------+-----------------+--------------+----------+----------------+---------------+

Full specifications for the GPU cards can be found for `NVIDIA A30`_ , `NVIDIA A40`_ and `NVIDIA A100`_

The GPUs in Sods have different compute capabilities.
The compute capability of a device is represented by a version number, also sometimes called its “SM version”.
This version number identifies the features supported by the GPU hardware and is used by applications at runtime to determine which hardware features and/or instructions are available on the present GPU.

The compute capability comprises a major revision number X and a minor revision number Y and is denoted by X.Y.
All the nodes in Dolly Sods have the same major revision number but different minor revision numbers.
Devices with the same major revision number are of the same core architecture.
The major revision number is 8 for devices based on the NVIDIA Ampere GPU architecture.

You can see `Compute Capabilities`_ for other GPU cards.


.. _NVIDIA A30: https://www.nvidia.com/content/dam/en-zz/Solutions/data-center/products/a30-gpu/pdf/a30-datasheet.pdf
.. _NVIDIA A40: https://images.nvidia.com/content/Solutions/data-center/a40/nvidia-a40-datasheet.pdf
.. _NVIDIA A100: https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/a100/pdf/nvidia-a100-datasheet.pdf
.. _Compute Capabilities: https://developer.nvidia.com/cuda-gpus