lighthouse slurm support - raeker/ARC-Wiki-Test GitHub Wiki
*** These changes must be run on lh-build.arc-ts.umich.edu ***
*** Any changes made must be pushed to the rest of the cluster using: sudo /usr/arcts/systems/scripts/ansibleSync.sh ***
1) Run this command to create a login for a specified user on the operating system:
- sudo /usr/arcts/systems/scripts/addLinuxUser.sh <user>
*** You can comma delimitate users. See below ***
- sudo /usr/arcts/systems/scripts/addLinuxUser.sh <user1>,<user2>,<user3>
2) Check to see if the user belongs to any Slurm Accounts. If they do, create their scratch directories for those accounts.
- my_accounts <user>
- sudo /usr/arcts/systems/scripts/addScratch -a <slurm_account> -u <user>
3) Run ansible sync to push the changes to /etc/passwd and /etc/group out to all the compute nodes:
- sudo /usr/arcts/systems/scripts/ansibleSync.sh
4) Add the user to the MCommunity group hpc-users-lighthouse. This is necessary because it allows them to access Open On Demand (lighthouse.arc-ts.umich.edu in a web browser)
*** This will create a placeholder for a user that does not have a user login on Lighthouse ***
1) Run this command to add a specified user to a specified Slurm Account:
- sudo /usr/arcts/systems/scripts/addSlurmUser.sh -u <user> -a <slurm_account>
*** Note: Make sure the user's default Slurm Account is NOT the one you're removing them from ***
****** Change user's default Slurm account: ***sudo /opt/slurm/bin/sacctmgr modify user <user> set DefaultAccount=<slurm_account> where cluster=greatlakes *** (systems will have to run this)
- Run this command to remove a specified user from a specified Slurm Account:
- sudo /usr/arcts/systems/scripts/delSlurmUser.sh -u <user> -a <slurm_account>
*** If the above command for some reason fails, the user can be removed with 'sacctmgr' ***
*** Cluster must be defined, otherwise the user will be removed from the Slurm Account but not in the database (not entirely removed). Systems will need to run this***
- sudo /opt/slurm/bin/sacctmgr remove user <user> cluster=greatlakes account=<slurm_account>
- Remove the user from the scratch directory for the account you took them out of
- sudo /usr/bin/gpasswd -d $u <account>_root
- Run sudo /usr/arcts/systems/scripts/ansibleSync.sh
1) Specific Slurm Account:
- sudo /usr/arcts/systems/scripts/addScratch.sh -a <slurm_account>
2) Specific user under specific Slurm Account
- sudo /usr/arcts/systems/scripts/addScratch.sh -a <slurm_account> -u <user>
3) Specific user
- sudo /usr/arcts/systems/scripts/addScratch.sh -u <user> *** NOT WORKING YET ***