User Policy and Best Practice - nthu-ioa/cluster GitHub Wiki

Storage and I/O

Writing and Reading

:warning: This is important: There is no backup of any data on the cluster. If you want a backup, you need to make it yourself!

See also:

Please keep in mind that job IO happens over the network. The following will waste time, storage space and network bandwidth. Try not to do these things.

  • :stop_sign: Using text formats (ASCII files) to store large amounts of data (>10MB).

  • :stop_sign: Writing thousands of tiny files.

  • :stop_sign: Opening, closing and reopening files repeatedly. These operations take a relatively large amount of time. When you close the file, data in the cache is flushed, reducing the benefit of caching.

  • :stop_sign: Having multiple processes all trying to write to the same file. This can lead to deadlocks and lots of wasted network bandwidth.

Instead:

  • :green_circle: Use a binary file format (e.g. numpy npz, FITS, HDF5)

  • :green_circle: Group similar data into a single large file (HDF5, FITS etc. can help with this too)

  • :green_circle: Keep file handles open, and avoid forcing sync operations unless you really need them.

  • :green_circle: Think carefully about IO in parallel/multithread jobs. Consider writing one file per process or having one process designated to handle IO.

Best practice: for your own benefit and that of other users, please try to have your compute jobs write directly to (and read directly from) /data only. The /home system is mounted on our login node and is not designed to handle heavy I/O and the associated network traffic. Be aware of 'hidden' writes to (and reads from) /home in batch jobs (for example, error logs, core dumps, or temporary cache files). Please direct all such writes/reads to /data (or local scratch discs) instead.

Symbolic Links

For convenience, many users like to create a symbolic link of the form /cluster/home/me/data -> /data/me. This is mostly harmless.

However, please avoid the following:

  • :stop_sign: Writing to your /data space through a symlink under /home, particularly from I/O-intensive batch jobs. This creates needless network traffic.

  • :stop_sign: Making symbolic links to files, especially files that you access with frequent heavy I/O through the symbolic link.

  • :stop_sign: Creating hidden loops of symbolic links, especially between /home and /data; for example linking /home/me/myproject/data -> /data/me/myproject and then /data/me/myproject/scripts -> /home/me/myproject. The worst loops, of course, directly connect symbolic links to each other (do not do that).

Suggestion: rather than a symlink under /home, consider a shell alias to quickly cd to your data space, e.g. alias cdd='cd /data/me' and/or an environment variable such as $DATA = /data/me.