Scratch Space - nthu-ioa/cluster GitHub Wiki

Scratch space (on our cluster) means local solid state drives attached to individual nodes. CPU nodes c01 to c04 have 900GB of scratch each, c05 to c13 have 440GB of scratch each, m01 and m02 nodes have 10TB of scratch each.

These local drives are available under the /scratch directory on each node. Local means a given node cannot see the /scratch directories on any other nodes. The scratch space on the cpu nodes is slightly different from that on the mem nodes, as described below. The gpu nodes currently do not have /scratch.

Using scratch can speed up your code if it spends a lot of time on IO. Reading and writing to /scratch is much faster than using the /data disks, because the drives are ~10x faster and the data don't have to travel over the network.

Good use of scratch requires some thought and planning. It helps to understand your code's IO pattern and think about how it can be optimized without using scratch first.

Finally, remember that scratch is a finite resource shared by all the users. Please tidy up after your jobs.

Summary of how to use scratch

Read the notes below to understand the differences between the cpu and mem node /scratch. Make sure you understand the scratch policy including the automatic removal of old files on the cpu nodes. Think carefully about how and when to use scratch.
Create a directory for your username under /scratch (cpu nodes) or /scratch/u (mem nodes), if it doesn't already exist.
Run your job in the queue as usual, reading and writing to /scratch. To make sure your job only starts on a cpu node with scratch (c05-c13), you can add the line #SBATCH constraint=scratch to your job script.
When your job has finished, try to delete any files in /scratch that you don't need to keep, and move anything you want to keep to the /data disks.

Scratch policy

Please put your scratch files under your own directory: /scratch/**username** (for example /scratch/apcooper). You might need to create this directory if it doesn't exist. Remember, the disks are local to each node, so you will have to create this directory on every node you use. You probably want to do this automatically in your slurm job script.

To help other users, please delete any files you write to scratch as soon as possible (ideally, your script should delete the files it creates as soon as your job completes successfully). If the files you create on scratch have long-term value, you can move them to your /data space.

This policy only enforced very loosely. It is mostly up to you to decide how long to keep your files. You can store them on /scratch for several days or weeks if that will help you, as long as you are doing something with them. Scratch is not for long term storage, please keep in mind that other users might need the space.

On the cpu node /scratch we will automatically delete files that haven't been accessed for a long time. Please don't rely on this to tidy up after your jobs, do that yourself.

You don't have to delete your /scratch directories, only files.

If you use /scratch often, you might want to add code to your slurm job scripts to make sure that enough space is available on your nodes before the job starts, and handle the tidying up after the job has finished (perhaps doing different things depending on whether it succeeded or not).

CPU Node Scratch

Each of the nodes c01 to c04 has 900GB of local storage. Nodes c05 to c13 have 440GB of local storage. This is provided by one SSD drive in each node (Seagate IronWolf Pro in c01-c04, Seagate Nytro 1351 in c05-c13; the Nytro disks are faster but have lower capacity).

Tasks running on a given node will only have access to /scratch on the same node.

You might want to use the --exclusive sbatch option to reserve whole nodes for your scratch jobs, so that other users' jobs don't compete for scratch space at the same time. This will be more important if your job uses a large fraction of the total scratch space on a node.

CPU Node scratch policy

Files on the cpu node /scratch that are not associated with active jobs may be deleted without warning by the administrators. Unless the disks are nearly full, files on /scratch will be deleted automatically when they haven't been accessed for 7 days. If a disk is filling up, files older than 1 day will be deleted.

:warning: The bottom line is that no data on the cpu node scratch is 'safe' unless it is associated with a running job. If you really want to keep it, move it to /data as soon as you can. Otherwise assume it can disappear any time.

Memory Node Scratch

The nodes m01 and m02 have ~10TB of scratch space each (2 Seagate Nytro 3532 SSDs, combined into one logical volume).

/scratch/u
/scratch/db

Unless you are running a database, use /scratch/u.

If your jobs are accessing a local database (for example, MySQL or postgreSQL) then you can store the database under /scratch/db. That path has some filesystem parameters set to optimize database access (including small record size and no filesystem-level sync).

/scratch/u and /scratch/db share the same 10TB of available space. This space uses zfs compression, so files on /scratch will be smaller than they are on /data.

Memory Node scratch policy

Unlike the cpu nodes, we currently don't have an automatic cleanup policy for the memory nodes. Please try hard to keep your usage to only what you need for active projects. If there is not enough space, please negotiate directly with other users, or contact the admins.

We will occasionally check for files that have not been accessed in a long time. We will not delete anything without asking.

When to use scratch

To benefit from using scratch, you need to have some idea of why it will help, and a plan for using it.

A code might benefit from using /scratch if it:

does lots of small IO operations involving temporary files (including memory mapping);
writes intermediate 'restart' files that are not useful after the calculation is complete;
separates IO by MPI task, rather than having all tasks reading/writing a single large file;

Our CPU nodes already have a lot of memory per core, but using scratch space for temporary storage may still help memory-limited codes to use more cores per node.

On the memory nodes, /scratch might also be useful for interactive work with very large files. If you have a 100GB (or 1TB) dataset that you will need to read many times over a short period (1 week or 1 month, for example), you can copy it to /scratch.

:warning: Please think very carefully about the time it takes to copy a huge file from /data to /scratch over the network. If you only need to read that file once, there is no point to copy it to /scratch first -- this won't save any time, you might as well have just read it directly from your job!

Advanced usage

Consider \dev\shm (a shared filesystem in memory) as an alternative to \scratch. \dev\shm is available on all nodes, up to half the memory on the node or the memory limit of your job (whichever is smaller). If you use \dev\shm, please tidy up after your job immediately and carefully.
Read about the slurm sbcast command.