Great Lakes Slurm Support - raeker/ARC-Wiki-Test GitHub Wiki
*** These changes must be run on gl-build.arc-ts.umich.edu ***
*** Any changes made must be pushed to the rest of the cluster using: sudo /usr/arcts/systems/scripts/ansibleSync.sh ***
- Run this command to create a login for a specified user on the operating system:
- sudo /usr/arcts/systems/scripts/addLinuxUser.sh <user>
*** You can comma delimitate users. See below ***
- sudo /usr/arcts/systems/scripts/addLinuxUser.sh <user1>,<user2>,<user3>
- Check to see if the user belongs to any Slurm Accounts. If they do, create their scratch directories for those accounts.
- my_accounts <user>
- sudo /usr/arcts/systems/scripts/addScratch -a <slurm_account> -u <user>
- Run ansible sync to push the changes to /etc/passwd and /etc/group out to all the compute nodes:
- sudo /usr/arcts/systems/scripts/ansibleSync.sh
- Add the user to the MCommunity group hpc-users-greatlakes. This is necessary because it allows them to access Open On Demand (greatlakes.arc-ts.umich.edu in a web browser)
*** This will create a placeholder for a user that does not have a user login on Great Lakes ***
- Run this command to add a specified user to a specified Slurm Account:
- sudo /usr/arcts/systems/scripts/addSlurmUser.sh -u <user> -a <slurm_account>
*** If the user was a member of lsa1, remove them from lsa1 once they have been added to their PI's account ***
*** Note: Make sure the user's default Slurm Account is NOT the one you're removing them from ***
****** Change user's default Slurm account: ***sudo /opt/slurm/bin/sacctmgr modify user <user> set DefaultAccount=<slurm_account> where cluster=greatlakes *** (systems will have to run this)
1) Run this command to remove a specified user from a specified Slurm Account:
- sudo /usr/arcts/systems/scripts/delSlurmUser.sh -u <user> -a <slurm_account>
*** If the above command for some reason fails, the user can be removed with 'sacctmgr' ***
*** Cluster must be defined, otherwise the user will be removed from the Slurm Account but not in the database (not entirely removed). Systems will need to run this***
- sudo /opt/slurm/bin/sacctmgr remove user <user> cluster=greatlakes account=<slurm_account>
- Remove the user from the scratch directory for the account you took them out of
- sudo /usr/bin/gpasswd -d $u <account>_root
-
Run sudo /usr/arcts/systems/scripts/ansibleSync.sh
-
Ask the systems team to remove the scratch
- Specific Slurm Account:
- udo /usr/arcts/systems/scripts/addScratch.sh -a <slurm_account>
- Specific user under specific Slurm Account
- sudo /usr/arcts/systems/scripts/addScratch.sh -a <slurm_account> -u <user>
- Specific user
- sudo /usr/arcts/systems/scripts/addScratch.sh -u <user> *** NOT WORKING YET ***
*** The current process for creating a Slurm Account on Great Lakes (subject to change as this is going to be automated in the future) ***
*** Account requests can be viewed from TeamDynamix - it also sends an email with the subject "HPC Resource Request" that contains the "blob", or all the information given from the form. ***
- Mcommunity group
It is automatically created via SN Account Request form, if you need to manually create the group the steps are in the drop-down below:
Manual MCommunity Group Creation
- Go to mcommunity.umich.edu
- Go to "My Groups" and select "Create a group"
- Here is the required group setup:
- Group Name: arcts-<slurm_account>-admins
- Group E-Mail: arcts-<slurm_account>-admins
- Description: leave blank
- Joining the group: Owners must add members
- Members list is viewable by: Anyone
- Messages can be sent to the group by: Only Members
- Add Owners: hpc-systems,arcts-helpdesk and
- One of these if the account is under the corresponding unit:
- LSA: lsait-ars-hpc-provisioning
- CoE: coe-arcts-hpc-admin
- Med: HITS-RAAC-HPC-Support
- One of these if the account is under the corresponding unit:
- Add the PI and any requested admins as members
- (Armis2 ONLY) If the account existed in Armis, bring over the members from the existing MCommunity group (likely "accountname-armis")
- Run this command to Create the Slurm Account:
- sudo /usr/arcts/systems/scripts/addSlurmAccount.sh -a
<slurm_account> -o <organization> (if the organization name has
paces in it the name must be enclosed in quotes - e.g.,"public
health")
- You can find a list of organizations at the bottom of this section
- Change the description for the Slurm Account to show the name of the admin MCommunity group:
- sudo /opt/slurm/bin/sacctmgr modify account name=<slurm_account> set description=arcts-<slurm_account>-admins
- Add the user list to the Slurm Account:
*** This will create a placeholder for any user that does not have a user login on Great Lakes ***
- sudo /usr/arcts/systems/scripts/addSlurmUser.sh -u <user> -a <slurm_account>
- Create the Scratch directories for the Slurm Account:
- sudo /usr/arcts/systems/scripts/addScratch.sh -a <slurm_account>
- sudo /usr/arcts/systems/scripts/addScratch.sh -a <slurm_account> -u <user>
- sudo /usr/arcts/systems/scripts/addScratch.sh -u <user> *** NOT WORKING YET ***
6) Add the account's "blob" into the SlurmOps Database from gl-build.arc-ts.umich.edu
- Navigate to: /nfs/turbo/arcts-hpc-support/arcts-support/GL (or A2)
- Create the blob file: nano $account_name
- Populate the file with the blob
-
If the spending limit on the account is a one-time limit you will need to change "Spending Limit: " to None
-
You need to manually change the date format from MM/DD/YYYY to YYYY-MM-DD
-
You will need to manually change the School/College to meet the format of the Organizations in Slurm. All organization names are separated by spaces:
architecture urban planning arcts art design business dearborn dentistry education engineering environment sustainability information inst social research its kinesiology law school life sciences institute literature science arts matthaei botanical gardens medicine music theatre dance nursing office research pharmacy public health public policy rackham school root social work transportation research inst um dearborn um flint umor
-
- Insert the blob into the SlurmOps DB: sudo python3 /usr/arcts/systems/scripts/addSlurmOps.py "$blob"
- Query the database to ensure that the blob was ingested into the database - sudo python3 /usr/arcts/systems/scripts/querySlurmOps.py -a <account name> -c "cluster"
The process for creating an account for a class is the same as described above. A few things to keep in mind:
- When naming a class account the format must be
<department><course number><section number><semester>_class
- For example - eecs558s007w21_class
- Each student receives $60.91 to use for computing. The best way to
assign this is to set a template user:
-
sudo ./addSlurmUser.sh -a <class account name> -u template_user -b 60.91
-
Set the template user first. Doing this will ensure that all users added after will inherit the limits from the template user
-
If the PI for the class wishes to have other limits set such as a core limit or a gpu limit you can add them to the template user using the appropriate flags:
-
-b Slurm user billing limit (in dollars)
-c Slurm user cpu limit
-g Slurm user gpu limit
-j Slurm user running job limit
-m Slurm user memory limit
-C Slurm user per-job cpu limit
-M Slurm user per-job mem limit
-G Slurm user per-job gpu limit
-w Slurm user wallclock limit
The process for creating a UMRCP account is the same as any other account with a few differences:
- The UMRCP account must end in a
0, otherwise the script will error out. - You must specify the
-roption to indicate the account is a UMRCP account. - You must also specify the
-Hoption with the amount of CPU hours the account will get.
The script will convert the CPU hours into a billing limit and set it appropriately onto the account. Example:
$ sudo /usr/arcts/systems/scripts/addSlurmAccount.sh -a cgbriggs0 -o arcts -r -H 80000 Adding UMRCP Account cgbriggs0 to cgbriggs_root with limits ( GrpTRESMins=billing=12022222224 ) on greatlakes /opt/slurm/bin/sacctmgr -i add account cgbriggs0 organization='arcts' parent=cgbriggs_root cluster=greatlakes Description='UMRCP' GrpTRESMins=billing=12022222224 Adding Account(s) cgbriggs0 Settings Description = umrcp Organization = arcts Associations A = cgbriggs0 C = greatlakes Settings GrpTRESMins = billing=12022222224 Parent = cgbriggs_root $ sacctmgr show account cgbriggs0 Account Descr Org ---------- -------------------- -------------------- cgbriggs0 umrcp arcts $ sacctmgr list assoc account=cgbriggs0 format=cluster,account,grptresmins -P Cluster|Account|GrpTRESMins greatlakes|cgbriggs0|billing=12022222224
UMRCP accounts are allotted an annual amount of 80,000 core hours. This amount may be allocated 100% to Great Lakes, 100% to Armis2, or some other ration between the two. Pay close attention to the hours requested for the account you are creating.
*** Not able to do yet. Entry here for future documentation ***
f you mistakenly create a Slurm Account, you should remove all users from the Slurm Account. The account will still exist in slurm, but this way users will not see the erroneous account when they run my_accounts.
*** Unix groups created for MCommunity groups must use the same GID (Group ID) ***
- Use GID from MCommunity group to create the local unix group with the following command:
- sudo /usr/sbin/groupadd -g <GID> <group_name>
s1">1) Add users to local unix groups with the following command:
- sudo /usr/sbin/usermod -aG <group_name> <user>
- Remove users from local unix groups with the following command:
- sudo /usr/sbin/gpasswd -d <user> <group_name>
-
Accounts for use by a single PI (principle investigator) and their lab members: <uniqname><#> (eg dgkorth1)
-
Accounts for use by multiple PIs and their lab members: <project_name>_project<#> (eg birdstudy_project1)
-
Accounts for use by students in a class: <class_name><class#><class_semester>_class<#> (eg bio100w20_class1)
-
Accounts for use by members of a department (ie a subject within a school, like Econ within LSA): <department_name>_dept<#> (eg bio_dept1)
- Accounts for use by an entity within a school that is not a department, but larger than a project (the entity may contain multiple projects): <school>_<entity_name><#> (eg lsa_birdinitiative1)
- For increasing an account limit:
- First run sacctmgr show assoc account=<account name> format=GrpTRESMins%30
- This command will give you the current spending limit on the account. You need to divide this by 100,000 to get the actual dollar amount.
- Next, check the current spending limit in the blob for the account and in the spreadsheet. You will need to update both of these once you've adjusted the spending limit on the account.
- Take the new limit and subtract the old limit. This is the value you need to add to the result of dividing what you got from the the sacctmgr command. For example, if the current limit on the account is 65549902 from the command output, divide that by 100,000 to get 655.49902. If the current limit on the account is $500 and it needs to be raised to $1000, simply add $500 to the 655.49902. This will give you 1055.49902.
- Plug the new limit number into this command:
- sudo /usr/arcts/systems/scripts/modifySlurmAccount.sh -a <account name> -b 1055.49902
- This will then update the spending limit on the account. You should
run Ansible as a matter of habit:
- sudo /usr/arcts/systems/scripts/ansibleSync.sh
- After this is done make sure to update both the blob and the spreadsheet to reflect the new limit.
- To update the blob, cd to /nfs/turbo/arcts-hpc-support/arcts-support.
- Then cd to GL, A2, or LH, depending on where the account you need to modify lives.
- Open the account's blob with an editor of your choice. Change the
spending limit, copy the entire contents of the file, save the file,
and the resubmit the blob to the database with
- sudo python3 /usr/arcts/systems/scripts/addSlurmOps.py
- Once that command is on the screen immediately follow it with a double quotation mark ' " '. Paste in the contents of the blob file that you copied and finish with another double quotation mark ' " '. Then you can hit enter to submit the updated file to the database.
- For decreasing an account limit:
- ***It is best to wait to decrease a limit until the beginning of a calendar month. Limits are reset on the 1st for accounts so a decrease will not interfere with any jobs. If the decrease is needed before the 1st of a month, check the existing job queue to ensure that the jobs in queue will not hit the lowered limit. If there is a concern that cost of the jobs in queue will be an issue, inform the unit support person/PI that the limit will be lowered on the 1st. ***
-
Follow all of the same steps for increasing a limit above. The only difference wis that you will be subtracting instead of adding.
- For example, if the current limit on an account after running the sacctmgr command above shows 88895472, divide that by 100,000 to get 888.95472. If the current spending limit on the account is $1000 and it needs to be reduced to $750, you need to subtract $250 from the existing limit: 888.95472 - 250.00 will give you 638.95472.
- Enter the new value into the modifySlurmAccount.sh command:
- sudo /usr/arcts/systems/scripts/modifySlurmAccount.sh -a <account name> -b 638.95472*
-
Run Ansible and then adjust the blob and spreadsheet.
Very few accounts have user level spending limits and we do not create new ones without authorization from Matt Britt. The process for adjusting user limits is easy:
- ">Run sresport -T billing cluster AccountUtilzationByUser Accounts=<account name> Start=2020-01-06 End=<beginning of calendar month>
- This will give you all the usage for all users on an account through the end of the last month/billing period. Take this number and add the user's monthly spending limit.
- For example, after running the sreport command shown:
- Take the number in the 'Used' column and divide by 100,000. Then add the users spending limit to the amount. In the screenshot above the user astridr used $837.84616 from Jan 1st through June 1st.
- Check on the spreadsheet for accounts created to find the account and user limits. In this example the iainboyd1 account has user level spending limits on all users. For astridr the monthly limit is $500 per month.
- Add the $500 to the usage up through the beginning of the month and that becomes the user's new limit: $500 + $837.84616 = $1337.84616.
- Run this command to make the change:
- sudo /usr/arcts/systems/scripts/modifySlurmUser.sh -a <account name> -u <user name>-b 1337.84616
- Run Ansible to push it through.
- **All the above is for reference since members of the unit support teams will send in updated user level limits at the beginning of each month. They will provide the modifySlurmUser command filled out for each change for each account. **
*** If they have data in scratch, confirm they no longer need the data. If they need their data, they will need to move it so you can remove their scratch directory ***
- Tar the user's /home directory
- sudo /bin/tar czf /nfs/locker/flux-support/userstodelete/gl-username-YYYY-MM-DD.tar.gz /home/username
- Run the following commands:
*** This command will remove the user from an account. It will remove the user from Slurm completely if the account you provide is the very last account they are associated with. If the account is the only account the user belongs to (their default account) then you may need to poke systems to remove the user ***
- sudo /usr/arcts/systems/scripts/delSlurmUser.sh -a $account -u $user
*** This command will also remove the user's /home directory and mail spool ***
- sudo userdel -r username
- Run ansibleSync.sh
- sudo /usr/arcts/systems/scripts/ansibleSync.sh
- Add the user as normal
- sudo /usr/arcts/systems/scripts/addLinuxUser.sh username
- Restore their home directory
*** Check to see whether there is an archived home directory. If there is, then
make sure that you are in the root directory
because /homeis part of the path for the home directory being restored. ***
-
- cd /*
-
- sudo tar xzvf /nfs/locker/flux-support/userstodelete/gl-username.tar.gz*
