Bash Tutorial on Compute Canada - erynmcfarlane/StatsGenLabProtocols GitHub Wiki
StatsGen Lab Meeting, March 19th, 2024 -- by Amanda Meuser
After logging in, you should be in your home directory. This is denoted by a tilda (~) beside your username and the login node you're on.
Try running the command below. This will print your working directory (the one you're currently in)
pwd
Now try running the commands below. They show us all files or subdirectories of our current working directory.
ls
What do these flags do? What's the difference between them?
ls -l
ls -lh
We use cd to change directories. We type cd and then the location we're moving to. We can try moving into scratch (temporary file storage and optimal location for running jobs)
cd scratch
Try typing ls
to see what's in the scratch directory. What happens?
If we want to move back up one directory, we can either type in the full, absolute path of the location we want to move to, or the relative path. (Think formal mailing address vs telling someone how to get to York Lanes from the Farquharson building)
Let's try typing in the absolute path for our home directory, to move there
cd /home/<username>
(ex. /home/ameuser)
Now try moving back to scratch and then move up one using the relative path. Also, you can use tab to autofill a directory name. Try typing "scr" then hitting tab to autofill the rest of "scratch".
cd scratch
cd ..
Where are we now? ..
means the previous directory, while .
means the current working directory.
Because your home directory is a special starting place, there's also a short cut specific to it. Try going into scratch again, then using the ~ as a short cut.
cd scratch
cd ~
Let's create a document in our scratch folder, using my go-to text editor, nano. There are other ones, but this is what I use.
cd scratch
nano test.txt
Type some stuff in it, then save with Ctrl+S, and exit with Ctrl+X. View the file with head and less (tail does the opposite of head, but it's not obvious for small files).
head test.txt
less test.txt
(hit q to exit)
What's the difference between these two commands?
To create a directory, use mkdir
mkdir stuff
cd stuff
pwd
We can delete this file with rm
, and use rm -r
for deleting directories. Be careful with rm
!! There's no recycling bin. When you delete things, they're gone.
cd ..
rm test.txt
rm -r stuff
To run jobs on the command line, we want to start an interactive session aka move to a compute node. These settings give us one hour and 4MB of memory.
srun --pty --account="def-mcfarlas" -t 0-01:00:00 --mem=4000 /bin/bash
We can now run scripts on the command line
module load r
Rscript plotting_DIC.R EGM19_cc_allSpecies_DIC_trimmed.txt
Yay it should have created some PDFs!!
We can look at the startingvals_loop_queuesub.sh
script and talk about what an sbatch script needs to have to be submitted to the slurm scheduler.
Another tip that I haven't mentioned is that a single right click will paste anything that's in your clipboard
# script for plotting DIC values from entropy runs
# Amanda Meuser -- March 2023
# USAGE: Rscript plotting_DIC.R file
# load packages
print("Loading packages...")
library(tidyverse)
library(tools)
# import DIC file
args <- commandArgs(TRUE)
file <- args[1]
DIC <- read.delim(file, header=T, sep = "\t")
print("Dimensions of file:")
dim(DIC)
print("Here's a sneak peak:")
head(DIC)
# average across the 3 reps
DIC_avg <- DIC %>% group_by(k) %>% summarize_all(mean)
# remove useless rep column
DIC_avg <- DIC_avg[,-2]
print("Dimensions of file:")
dim(DIC_avg)
print("Here's a sneak peak:")
head(DIC_avg)
print("The optimal value of k is:")
DIC_avg$k[which.min(DIC_avg$Model_DIC)]
basename1 <- basename(file)
basename <- file_path_sans_ext(basename1)
print("Basename is:")
basename
print("Creating plots...")
pdf(paste0(basename,"_plot.pdf"), width=11, height=11)
ggplot(DIC_avg, aes(k, Model_DIC)) + geom_point()
dev.off()
pdf(paste0(basename,"_plot_extra.pdf"), width=11, height=11)
par(mfrow=c(1,3))
ggplot(DIC_avg, aes(k, Model_deviance)) + geom_point()
ggplot(DIC_avg, aes(k, Effective_number_of_parameters)) + geom_point()
ggplot(DIC_avg, aes(k, Model_DIC)) + geom_point()
dev.off()
k rep Model_deviance Effective_number_of_parameters Model_DIC
k1 rep1 116723.64 46280.78 163004.42
k1 rep2 116715.08 44934.97 161650.04
k1 rep3 116701.76 51221.82 167923.58
k2 rep1 93934.44 111045.87 204980.31
k2 rep2 93969.86 90658.80 184628.66
k2 rep3 94055.12 115396.61 209451.72
k3 rep1 93594.66 70059.74 163654.40
k3 rep2 93537.24 74041.86 167579.10
k3 rep3 92376.52 70654.95 163031.47
k4 rep1 91159.48 104401.82 195561.30
k4 rep2 91155.57 125011.60 216167.18
k4 rep3 91292.55 124920.52 216213.07
k5 rep1 89198.25 81986.15 171184.40
k5 rep2 89371.42 90499.67 179871.09
k5 rep3 89965.51 94692.05 184657.56
##USAGE: sbatch startingvals_loop_queuesub.sh k rep /path/to/mpgl_and_ldak
##example: sbatch ../../startingvals_loop_queuesub.sh 8 3 /project/rrg-emandevi/hybrid_ameuser/AMP22/starting_values_entropy
## Tips:
## modify k and starting values script/path on command line
## do NOT put a / at the end of the path on command line
### ---------- Job configuration --------------------------------------------
# Run dependent and permanent parameters
# will be run on complete nodes NOT partial
#SBATCH --nodes=1 # number of nodes to use
#SBATCH --time=00-0:15:00 # time (DD-HH:MM:SS)
#SBATCH --account=rrg-emandevi # account name
#SBATCH --job-name="loop_script" # name to display in queue
#SBATCH --ntasks-per-node=1 # taks per node (one core per node)
#SBATCH --mem=4000M # memory per node
#SBATCH [email protected] # who to email
#SBATCH --mail-type=ALL # when to email
for k in $(seq 5 $1) ##start at 5, sequentially go up by 1 till it gets to value of $1
do
for rep in $(seq 1 $2)
do
##echo $k $rep #was for testing!
sbatch /project/rrg-emandevi/hybrid_ameuser/entropy_queuesub_27jul22.sh $3/*.mpgl $k $rep $3/qk"$k"inds.txt
done
done