Instructions for Sherlock - wenqiang-geophys/rupt2Dquad GitHub Wiki
These instructions are specific for the Stanford GP229 Course.
Logging into Sherlock
ssh -X <YourSUNetID>@login.sherlock.stanford.edu
The -X is necessary for enabling X11 graphics forwarding.
After logging in, you will be on a login node that is unsuitable for any computing work. We request an interactive compute node with
sdev -p serc -t 01:10:10
which will grant us an interactive compute node in the serc partition with time limit for 1 hour and 10 minutes and 10 seconds. Adjust the time as needed, but keep in mind you may have to wait longer if you ask for more time.
Installation
After logging into Sherlock and requesting an interactive compute node,
cd $SCRATCH
The $SCRATCH directory provides faster I/O (input/output), but contents there will be automatically purged every 90 days. So, if you want to save any content for use after this course, please remember to move them elsewhere.
We obtain the code from GitHub:
git clone https://github.com/wenqiang-geophys/rupt2Dquad.git
Please make sure to use the command above as this will allow you to get additional files easily when we add them later in the quarter (See Update the code below).
Change into the downloaded directory:
cd rupt2dquad
Compile the code by
bash build_on_sherlock.sh
You may need to give the file execute permission:
chmod +x build_on_sherlock.sh
After a successful build, an executable exe_solver should appear in the bin directory.
Running
Locate the gp229 directory within rupt2dquad. This is where we will be putting in more simulation cases for your homework in the future. For now, let's learn how to run the code with the example tpv14.
cd gp229/tpv14/
To run the code, we simply submit a batch job to Sherlock's job scheduler
sbatch submit.sh
Within submit.sh you can find several useful settings for running the code:
job-name helps you to find the job on a list of jobs running on Sherlock.
output returns a slurm file which may contain valuable information if the job fails.
ntasks number of MPI processes requested. For our problem size, 1 task per partitioned mesh (number of mesh files in the data folder, should be enough.
ntasks-per-node how to split the number of tasks among different nodes. For this example, when ntasks-per-node = ntasks, we enforce all tasks to be on the same node. This can reduce communication overhead, but may increase queueing time.
cpus-per-task number of cpus to use per task. Keep to 1. A node has many CPUs (32 for a standard Sherlock node).
time time requested for the job. Adjust as needed.
partition the Sherlock partition to run the job. Keep to serc for Stanford Earth Research Computing.
Once the job is submitted, use
squeue -u <YourSUNetID>
to check the job status. It will show up with NAME as in the submit.sh file. If you would like to cancel the job before it finishes running, note its JOBID and
scancel JOBID
Changing simulation parameters
Often times you will need to change one or multiple simulation parameters. When doing this, first make sure your working directory, e.g. hw3_p2, doesn't contain a data directory, which stores the mesh and output used for a previous simulation. I recommend simply renaming this folder with an intuitive name such as:
cp -r data data_tau2_63MPa
to indicate the parameters used for this previous simulation. After renaming, delete the data folder. Now we are ready to change parameters.
On an interactive compute node, first load the necessary modules:
ml matlab metis
Then, navigate to the rupt2Dquad home directory and run
bash setenv.sh
Back in your working directory, e.g. hw3_p2, use your favorite text editor, such as vim to change the parameters labelled for you to explore in conf_stress.m. Then, use MATLAB to run run_preprocess.m. This generates the data folder with the pre-generated finite-element mesh populated with the correct values according to the settings in conf_stress.m. Now, without changing the name of the data folder,
sbatch submit.sh
should submit this job.
Analyzing simulation results
We provide MATLAB scripts to plot and analyze the simulation results. Feel free to do this step with Python.
On an interactive compute node, first load MATLAB
ml matlab
Then start it with
matlab (if you want the GUI, which can be very slow, not recommended)
or
matlab -nodesktop -nosplash (this still allows graphics forwarding)
On the MATLAB command line, type
draw_wave (without the .m)
to visualize the fault slip.
Then, try
draw_fault
The output of
draw_fault
and
figure;plot(t,v(170,:))
The output of plot(t,v(170,:))
If you encounter an error Unrecognized function or variable, you may run
addpath(genpath('../../mscripts'))
on the MATLAB command line, or add it to the beginning of a plotting script.
Updating the code
Within the rupt2dquad directory:
git pull
Although we won't do much coding, if you decide to track your changes on your local repo with git add and git commit, you might encounter merge conflicts with the remote repo. You can deal with them however you like as it's your local repo. To avoid having to deal with this, simply don't track your changes with git for this class.
A second way to get the updated files is just to download them from the repo on this website, and upload them to Sherlock:
rsync -avP new_folder <YourSUNetID>@dtn.sherlock.stanford.edu:$SCRATCH/rupt2dquad/gp229/