User Policy and Best Practice - nthu-ioa/cluster GitHub Wiki
Storage and I/O
Writing and Reading
:warning: This is important: There is no backup of any data on the cluster. If you want a backup, you need to make it yourself!
See also:
- Description of our storage.
- How to do transfer large amounts of data between
/home
and/data
and to/from the cluster.
Please keep in mind that job IO happens over the network. The following will waste time, storage space and network bandwidth. Try not to do these things.
-
:stop_sign: Using text formats (ASCII files) to store large amounts of data (>10MB).
-
:stop_sign: Writing thousands of tiny files.
-
:stop_sign: Opening, closing and reopening files repeatedly. These operations take a relatively large amount of time. When you close the file, data in the cache is flushed, reducing the benefit of caching.
-
:stop_sign: Having multiple processes all trying to write to the same file. This can lead to deadlocks and lots of wasted network bandwidth.
Instead:
-
:green_circle: Use a binary file format (e.g. numpy npz, FITS, HDF5)
-
:green_circle: Group similar data into a single large file (HDF5, FITS etc. can help with this too)
-
:green_circle: Keep file handles open, and avoid forcing sync operations unless you really need them.
-
:green_circle: Think carefully about IO in parallel/multithread jobs. Consider writing one file per process or having one process designated to handle IO.
Best practice: for your own benefit and that of other users, please try to have your compute jobs write directly to (and read directly from) /data
only. The /home
system is mounted on our login node and is not designed to handle heavy I/O and the associated network traffic. Be aware of 'hidden' writes to (and reads from) /home
in batch jobs (for example, error logs, core dumps, or temporary cache files). Please direct all such writes/reads to /data
(or local scratch discs) instead.
Symbolic Links
For convenience, many users like to create a symbolic link of the form /cluster/home/me/data -> /data/me
. This is mostly harmless.
However, please avoid the following:
-
:stop_sign: Writing to your
/data
space through a symlink under/home
, particularly from I/O-intensive batch jobs. This creates needless network traffic. -
:stop_sign: Making symbolic links to files, especially files that you access with frequent heavy I/O through the symbolic link.
-
:stop_sign: Creating hidden loops of symbolic links, especially between
/home
and/data
; for example linking/home/me/myproject/data -> /data/me/myproject
and then/data/me/myproject/scripts -> /home/me/myproject
. The worst loops, of course, directly connect symbolic links to each other (do not do that).
Suggestion: rather than a symlink under /home
, consider a shell alias to quickly cd
to your data space, e.g. alias cdd='cd /data/me'
and/or an environment variable such as $DATA = /data/me
.