📢 Do not run intensive jobs on Fomalhaut - nthu-ioa/cluster GitHub Wiki
This is a copy of an announcement sent to the CICA cluster mailing list (02/26)
Dear CICA users,
This is the second CICA announcement in this batch. This one is an important notice/reminder, especially for new users.
The node "fomalhaut" is a login node. It also runs a lot of monitoring and control software for the cluster. It is a low-spec machine compared to all our other nodes.
If someone runs a task on fomalhaut that takes a lot of CPU or memory resources, everyone else's access to the cluster may slow down. If you start a job that runs away and uses all the memory, there is a risk of crashing the node, which would be very disruptive.
This is why the login message says:
********************************************************************
** Fomalhaut is a login node only! Please be considerate to other **
** users and do not start long-running or intensive jobs on this **
** machine. Use the compute nodes (wherever possible via slurm). **
********************************************************************
We hope this is clear enough, but many users (especially new users) seem to miss it, or struggle with alternatives. For example, there is a jupyterlab job running on fomalhaut right now using ~10% of the memory. This is not good.
We hope the new jupyterhub server can help at least some users who are tempted to start jobs on fomalhaut.
If anyone has any problem where the solution seems to be "just run it on fomalhaut", please contact cica-admin for advice first. Almost always there is a reasonable alternative, even if it requires a little more effort up-front.
Here are a few more guidelines:
- It is OK to do basic shell tasks and run text editors on fomalhaut;
- It is OK to run a compiler as long as it doesn't use more than a few cores (you should just compile on one of the CPU nodes in that case);
- It is borderline OK to run a VNC session (please try to avoid running GNOME and keep web browser usage to a minimum, preferably zero -- use your own browser, which will be much faster!);
- VSCode sessions are currently a grey zone. Although it is possible to tunnel these jobs to the other nodes, we do not have a streamlined solution for this, so you will mostly get away with it for now -- but please watch your usage carefully. Please read the VS code wiki page.
Sometime in the near future, we will implement an automatic system that immediately kills any job using an unacceptable amount of resources on fomalhaut. Since so many people rely on the cluster to get important work done, we have to be strict about the fair use of resources.