Midway Essentials - antoszewski/test-wiki GitHub Wiki

One of the first things you should do when you join the Dinner group is connect to Midway, our computing cluster at UChicago. To access Midway, you will first need an RCC account. This will give you access to the two clusters we use most often - Midway2 and Midway3. These two clusters are not currently cross-mounted, meaning that you have to log onto each one separately, and files are not shared between them. Access to Midway3 also comes with access to Beagle3, a GPU cluster. Midway3 and Beagle 3 are cross-mounted, meaning you can access both by just logging into Midway3. Similarly, access to Midway2 comes with access to GM4, another GPU cluster.

To get an RCC account, first fill out the General User Account Request form. Aaron's PI account name is pi-dinner. You might need a rough overview of the type of work you will be doing, but these fields don't really matter a huge amount. Take a reasonable guess based on what you are interested in - that should be fine. Normally, these requests are processed in a week or less. If you have any questions (about setting up an account, or just in general), feel free to email [email protected] and ask. They are, writ large, a very helpful resource if you are having any problems with Midway (installing certain modules, troubleshooting unexpected behavior in SLURM, etc.) throughout your research career.

Once you have an RCC account, you can log access Midway through a variety of methods. The linked page is a good guide for doing this. Probably the most popular way to connect is through Thinlinc, either by downloading the desktop client or by accessing via a browser. Every time you login/move files, you will need to use two-factor authentication.

I highly recommend reading through the relevant user guides (Midway2 User Guide, Midway3 User Guide, GM4 User Guide, Beagle3 User Guide) to familiarize yourself with the system and available options. Many workflows are possible, and it's up to you to decide what is best for you and your productivity. Overall, on Midway2, you should have access to both your own personal /scratch/midway2/${CNET_ID} directory, and to the shared capacity /project2/dinner/ directory. On Midway3, you should also have access to the corresponding scratch (/scratch/midway3/${CNET_ID} and /scratch/beagle3/${CNET_ID} ) and capacity (/project/dinner/ and /beagle3/dinner/ ) directories. Each of these has quotas and limits for the number and size of files that can be stored on them. You should also have access to the compute hours (SUs) associated with pi-dinner. Commands to check these hours/limits can be found in Running Jobs on Midway.

General Advice

When you initially log in, you are put onto a log-in node. We are not charged any SUs for using login nodes. They are meant for light-weight tasks like directory navigation, writing code, etc. If you plan to do anything more resource-intensive (large mathematical computations, molecular dynamics, VMD visualization, etc.) you should instead use a compute node. Details on submitting jobs to our compute nodes can be found in Running Jobs on Midway. Note that if you overload the login nodes, Midway will kick you off as a slap on the wrist.
Make sure you are not writing in your home directory - it has very limited storage, and going over your quota can cause problems. A suggested workflow is to work in your scratch directory for initial/exploratory work, transfer to your capacity/project directory once you move on to a more intermediate stage, and then transfer old files to long-term storage ( /cds/weare-dinner/ on Midway2). Scratch is not backed up, and a relatively low file number/size quota, but a very high limit. If you go over the quota (but are below the limit), you have a 30-day grace period to go below the quota again. You only lose write permissions if you either go over the limit or you exceed the grace period to drop below the quota. Capacity directories are backed up multiple times per day, so any work put there can be recovered if something goes wrong. You can access these backups by navigating to the corresponding snapshot directory (found in /snapshots/). Long-term storage in /cds/ can only be accessed by login nodes, so you can't run any jobs on it.
Explore the SLURM/RCC commands listed in Running Jobs on Midway - they will give you a good idea of what you have access to and what you do not, as well as how much storage/computation time is available. Try to keep your SU and data usage reasonable, knowing that we all share these resources. We get a new batch of SUs every September - Aaron will ask for SU estimates so we can plan to use them throughout the year.
Try not to hog all of the space on the private partitions (weare-dinner, weare-dinner2, and dinner) if you can avoid it. While we have these nodes so that they can be used, keep in mind that they are the go-to resources for data manipulation and molecular visualization, as well as day-to-day work with Jupyter Notebooks, for example. It is good practice to try to keep at least a node or two available for others to use. If you need to use a significant portion of either partition for any length of time, please pass it by the other lab members, either in person or on Slack, first.