Data Storage

Scratch is designed to be a high-performance, low-latency, parallel file system and it is uniquely built for HPC systems. That means that data is allocated in parallel on several virtual disks intended to be read and write in parallel from several computers. In order for the scratch system to perform its best the files should be balanced across all disks and frequently used files should also being balanced.

In the page we will offer some tricks that will help you regain control of your scratch space in preparation to transferring your files to other locations.

Knowing my current usage

First we should know what we have in terms of used space and number of files

$ /usr/lpp/mmfs/bin/mmlsquota --block-size M gpfs01:scratch
                         Block Limits                                               |     File Limits
Filesystem Fileset    type             MB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
gpfs01     scratch    USR         1943448   62914560   68157440        235     none |   557657 24000000 25000000      172     none

You can also get this information in GigaBytes

$ /usr/lpp/mmfs/bin/mmlsquota --block-size G gpfs01:scratch
                         Block Limits                                               |     File Limits
Filesystem Fileset    type             GB      quota      limit   in_doubt    grace |    files   quota    limit in_doubt    grace  Remarks
gpfs01     scratch    USR            1898      61440      66560          1     none |   557657 24000000 25000000      172     none

On this case I am using almos 1900 GB of data in half million files. I would like to reduce as much as possible before trying to move my data out of Scratch. We will focus on command line tricks that will help you search for Big files, search for small and numerous files. The idea is to attack first the biggest contributors to my usage. This examples will be as generic as possible. At the end of the day its up to you to decide which data is not longer needed and can be safely deleted.

Consider delete a irreversible procedure, there is a way to recover files that have been deleted very recently but do not count on it. Being sure that you delete what you actually want to delete. The commands below are powerful and with that power comes a danger too.

Searching for empty files

An empty file is still a file. Even if apparently not using space its metadata still exists and any transfer of that file is also a file operation. This is something that in most cases is harmless. Be careful, in some cases empty files are used for some codes to indicate run conditions. Their sole existence carries a meaning.

On Spruce every user has an environmental variable pointing to their personal Scratch folder. You can see that variable with:

$ echo $SCRATCH
/scratch/<USERNAME>

On this examples we will use this variable to allow you to copy and paste this commands in total generality

find $SCRATCH -size 0

I got thousands of files. Some of them I do not want to remove, but there are quite a number of files like this:

./OrbitalGAF/58d36a1374d8b558e47f9623/58d36a1374d8b558e47f9623.e1165669
./OrbitalGAF/58d46c6d74d8b558e47f9cf7/58d46c6d74d8b558e47f9cf7.o1172814
./OrbitalGAF/58d46c6d74d8b558e47f9cf7/58d46c6d74d8b558e47f9cf7.e1172814
./OrbitalGAF/58d36a1374d8b558e47f9629/58d36a1374d8b558e47f9629.o1165671
./OrbitalGAF/58d36a1374d8b558e47f9629/58d36a1374d8b558e47f9629.e1165671
./OrbitalGAF/58d46c6d74d8b558e47f9cc7/58d46c6d74d8b558e47f9cc7.o1172802
./OrbitalGAF/58d46c6d74d8b558e47f9cc7/58d46c6d74d8b558e47f9cc7.e1172802
./OrbitalGAF/58d46c6d74d8b558e47f9cca/58d46c6d74d8b558e47f9cca.o1172803
./OrbitalGAF/58d46c6d74d8b558e47f9cca/58d46c6d74d8b558e47f9cca.e1172803
./OrbitalGAF/58d46c6d74d8b558e47f9cd3/58d46c6d74d8b558e47f9cd3.o1172806
./OrbitalGAF/58d46c6d74d8b558e47f9cd3/58d46c6d74d8b558e47f9cd3.e1172806
./OrbitalGAF/58d46c6d74d8b558e47f9cd6/58d46c6d74d8b558e47f9cd6.o1172807
./OrbitalGAF/58d46c6d74d8b558e47f9cd6/58d46c6d74d8b558e47f9cd6.e1172807
./OrbitalGAF/58d46c6d74d8b558e47f9cd9/58d46c6d74d8b558e47f9cd9.o1172808
./OrbitalGAF/58d46c6d74d8b558e47f9cd9/58d46c6d74d8b558e47f9cd9.e1172808
./OrbitalGAF/58d46c6d74d8b558e47f9cf4/58d46c6d74d8b558e47f9cf4.o1172813
./OrbitalGAF/58d46c6d74d8b558e47f9cf4/58d46c6d74d8b558e47f9cf4.e1172813
./OrbitalGAF/58d36a1374d8b558e47f963e/58d36a1374d8b558e47f963e.o1165678
./OrbitalGAF/58d36a1374d8b558e47f963e/58d36a1374d8b558e47f963e.e1165678
./OrbitalGAF/58d46c6d74d8b558e47f9cb5/58d46c6d74d8b558e47f9cb5.o1172796
./OrbitalGAF/58d46c6d74d8b558e47f9cb5/58d46c6d74d8b558e47f9cb5.e1172796
./OrbitalGAF/58d46c6d74d8b558e47f9cbe/58d46c6d74d8b558e47f9cbe.o1172799
./OrbitalGAF/58d46c6d74d8b558e47f9cbe/58d46c6d74d8b558e47f9cbe.e1172799
./OrbitalGAF/58d46c6d74d8b558e47f9cc1/58d46c6d74d8b558e47f9cc1.o1172800
./OrbitalGAF/58d46c6d74d8b558e47f9cc1/58d46c6d74d8b558e47f9cc1.e1172800
./OrbitalGAF/58d46c6d74d8b558e47f9cc4/58d46c6d74d8b558e47f9cc4.o1172801
./OrbitalGAF/58d46c6d74d8b558e47f9cc4/58d46c6d74d8b558e47f9cc4.e1172801
./OrbitalGAF/58d36a1374d8b558e47f9653/58d36a1374d8b558e47f9653.o1165683

All those are empty files created by Torque/Moab from simulations that I have carried out in the past. Those files in the format:

<JOB_NAME>.o<JOB_ID_NUMBER>
<JOB_NAME>.e<JOB_ID_NUMBER>

Those files contain the standard Output (o) and Standard Error (e) from the jobs that run on Spruce. Many of those files are empty because on my executions I usually send the output of the command to some other file and because the codes that I use do not write anything on the standard error. Deleting those empty files will not help me with my storage usage by it will reduce the number of files that I will have to move later on.

I can target those files with a more specific command:

find $SCRATCH -size 0 -regextype posix-egrep  -regex ".*\.[o,e][0-9]{5,7}"

The command above search for all empty files on Scratch that contains. On my account I found thousands of cases that I can safely delete. In most cases those files are not important to you to keep. They are empty and not used at all other than telling you the JOB_ID that you probably do not need for old jobs. If you feel confortable deleting those files you can execute:

find $SCRATCH -size 0 -regextype posix-egrep  -regex ".*\.[o,e][0-9]{5,7}" -exec rm {} \;

On my case it help me delete 16113 files

Searching for Big Files

Some problems and codes produce big files. In some cases those files are not needed to extract the actual answers for which you run those calculations and could be deleted safely. The first step is knowing where they are. This command search for files bigger than 2GB

find $SCRATCH -size +2G

You can increase or decrease the value if you are getting too many or very few. Once you identify the biggest files you can use the their names to target those files for deletion. Imagine that the files called “WAVECAR” can be deleted. You can use this command”

find $SCRATCH -size +2G -name "WAVECAR" -exec rm {} \;