great lakes faq - raeker/ARC-Wiki-Test GitHub Wiki
Q: In my experience with clusters most usually have a background super low priority queue with limited wall time settings that users can submit to which do not count toward their resource usage. Is any plan to have such features in great lakes?
A: There are currently no plans for such a model, but it is possible in the future.
Q: Our team just purchased an 18-month allocation of 8 GPUs on Flux. Will these allocations automatically be converted over to Great Lakes without the need for further purchase?
A: There are no allocations on Great Lakes, so we will create an account for you and you can use Great Lakes in parallel with what your use on Flux. Allocations on Flux are charged monthly, so at the point you are ready to transition completely to Great Lakes, you can let us know and we can terminate any Flux allocations on their monthly window and then you will no longer be billed for them.
Q: Are there any plans to make a HIPAA compliant version (i.e. an "armis" version)?
A: We now have a HIPAA compliant Slurm-based cluster - Armis2. Armis2 has Intel Haswell processors in all nodes, and it uses Turbo for /scratch storage as Armis 2 did. Further information about Armis 2 can be found here:
Q: I want to run VASP on Great Lakes.
A: Vasp is already installed on Great Lakes, but has a restricted
license. If a user has permission, add them to the `vasp` unix group.
Then they will be able to run vasp by running these commands
module load RestrictedLicense``module load vasp
Q: My job has been pending for days citing the reason (Priority)
A: This means that your job is successfully queued, it is currently accruing priority, and is awaiting to be scheduled. The cluster is being heavily utilized right now, so your job must wait for other jobs to finish before your job can start. We are actively monitoring and tweaking the scheduler where needed, so please continue to let us know when you are experiencing long times to start.
Q: My job is taking forever to start, and then when it does start it fails right away with an exit code of 1.
A: You may want to consider submitting short, small jobs to the debug
partition while you work on getting your code working properly. The
debug partition has limited job size, but should have quicker
time-to-start. Here is an example of a test job which uses the debug
partition:
#!/bin/bash
#SBATCH --account=<your PI's account>
#SBATCH --partition=debug
#SBATCH --time=5
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=64m
date
sleep 240
hostname
Q: I want to use <software> but it is not installed on Great Lakes. Would you please install <software> on Great Lakes?
A: Uncommon packages and utility software can typically be installed into the home directory of the individual who wants to use the software without involving systems staff. Put another way, installing software packages system-wide is not typically required in order to be able to use linux software. In fact, relying on system-wide installation of packages is actually slower for you: there can be a significant delay between submitting a request and the software getting installed, you cannot update or modify the programs whenever you desire, the software is less likely to be up to date, and you have to submit a request and wait every time you need the software updated. There is typically some small effort required by individuals to install these packages themselves, but the packages often include easy instructions that should function on most systems without modification.
Q: I am interested in using machine learning on Great Lakes.
A: We have many resources for machine learning for you to use on our clusters. Some
of the software we have for machine leanirng includes:
tensorflow
keras
the Statistics and Machine Learning Toolbox in Matlab
torch
opencv
caffe
CSCAR (which is another division onder ARC) is hosting an Introduction
to Deep Neural Networks with Keras/TensorFlow course on November 12th.
Registration is free and information about the course can be found
here:
https://arc-ts.umich.edu/event/introduction-to-deep-neural-networks-with-keras-tensorflow-5/
We also have a user guide for tensorflow as well:
https://arc-ts.umich.edu/greatlakes/software/tensorflow/
In addition to all of the above, our clusters have many GPUs for use for
machine learning. A small write up about that can be found here:
https://arc-ts.umich.edu/new-titanv-gpus/
Q: My job needs to access resources over the internet.
: The compute nodes on GreatLakes does not have direct access to the Internet. To be
able to connect from node you need to go through a proxy server we have
setup. Add these lines to your sbatch script before you call your
program.
export http_proxy="http://proxy.arc-ts.umich.edu:3128/"
export https_proxy="http://proxy.arc-ts.umich.edu:3128/"
export ftp_proxy="http://proxy.arc-ts.umich.edu:3128/"
export
no_proxy="localhost,127.0.0.1,.localdomain,.umich.edu"
export HTTP_PROXY="${http_proxy}"
export HTTPS_PROXY="${https_proxy}"
export FTP_PROXY="${ftp_proxy}"
export NO_PROXY="${no_proxy}"
Q: How do I see how much a job cost after it ran?
A : Use my_job_statistics <job id>