Data Storage

On a desktop machine, you normally own the whole hard drive and you are free to use it up to the limit of the physical drive. On a shared resource like a cluster, limits need to be imposed to avoid one user taking too much space or creating too many files, which could prevent other users to store their data or could reduce the overall performance of the drive with too many files.

There are two important locations from the user’s perspective, $HOME and $SCRATCH those locations sit on a high-performance storage device and uses GPFS as their file system. Those locations are also shared between all compute nodes, so no matter which machine you run, files on those locations are always available. In this quick start, we will describe briefly the purpose of each one of them.

Home Directory

Home directories on Spruce Knob and Thorny Flat are located in /users directory and they are the default login location for users. The default disk quota for each user is 10 GB. As such, we recommend that home directories use should be primarily to store scripts, binary executables, and all the software machinery to do your research, in cases where data is too large, the $HOME folder should not be used for storing data as it could fill quickly. For research data, users should use Scratch space on our clusters.

The good thing about $HOME folder, despite its limited size, is that all the data is backed up on a daily basis via snapshots and tape. Users can retrieve up to 4 weeks of older files from /users/.snapshots/{date}/$USER if needed. If files are not available in the snapshot, please contact helpdesk@hpc.wvu.edu and the HPC team will attempt to retrieve the data from the tape.

Note: Users may check their storage quota via the following command:

quota

This command is actually an alias for the GPFS command to manage quotas on the filesystem, the actual command is:

/usr/lpp/mmfs/bin/mmlsquota --block-size M

The block size can be changed to K for Kilo Bytes, M for MegaBytes, G for Giga Bytes, and T for Tera Bytes.

It is important to pay attention not only to the storage size but also to the limit on the number of files that can be stored. Too many files have an impact on file system performance so their number needs to be limited. If you create too many to reach the limit you cannot create more files on that file system even if your quota size still shows allowance.

Scratch Directory

Scratch directories are located in /scratch directory and can be easily moved to use the $SCRATCH environment variable set-up by default for each user. You can go to your scratch folder with:

cd $SCRATCH

Scratch directories on our clusters are treated as TRUE scratch space: There is no user-defined quota for space. All user’s scratch directories share the same file space, which is set at 130 TB. When the scratch directory becomes full, files can be automatically deleted by date. To avoid losing data, all users should use their scratch space as truly temporary storage. However, we will notify users well in advance for scheduled file deletions to give users ample time to remove data to prevent data loss.

It is a good idea to create a symbolic link from your $HOME folder to go directly to your $SCRATCH folder, this command will do that:

ln -s $SCRATCH $HOME/scratch

Next time, you can just use the symbolic link:

cd ~/scratch

Note: Scratch data is NOT backed up.

Persistent Group Storage

Research groups may purchase long-term persistent storage from the RC HPC. This storage allows users to keep data on the cluster that will not be removed. In addition, this storage utilizes the same GPFS file system so files can be written directly to this storage from jobs executing on the cluster.

Persistent storage must be purchased in 1 TB chunks. To purchase this storage, go to https://helpdesk.hpc.wvu.edu , select “Open a New Ticket”, and select “Research Data Depot Storage Purchase” from Help Topic. For more information, please contact helpdesk@hpc.wvu.edu.

In the past researchers were allowed to purchase space under /group folder. For groups who have purchased this storage, it can be accessed here:

cd /group/{group_name}

Note: Persistent group storage is NOT backed up.

Note: Persistent Group storage has been replaced by the new Research Data Depot Service

Archival Storage

Please see the Research Data Depot Service for archival storage options.

For additional information please contact helpdesk@hpc.wvu.edu.