Bash Tutorial on Compute Canada - erynmcfarlane/StatsGenLabProtocols GitHub Wiki

PowerPoint about Compute Canada

Demo on Cedar

StatsGen Lab Meeting, March 19th, 2024 -- by Amanda Meuser

After logging in, you should be in your home directory. This is denoted by a tilda (~) beside your username and the login node you're on.

Try running the command below. This will print your working directory (the one you're currently in)

pwd

Now try running the commands below. They show us all files or subdirectories of our current working directory.

ls

What do these flags do? What's the difference between them?

ls -l

ls -lh

We use cd to change directories. We type cd and then the location we're moving to. We can try moving into scratch (temporary file storage and optimal location for running jobs)

cd scratch

Try typing ls to see what's in the scratch directory. What happens?

If we want to move back up one directory, we can either type in the full, absolute path of the location we want to move to, or the relative path. (Think formal mailing address vs telling someone how to get to York Lanes from the Farquharson building)

Let's try typing in the absolute path for our home directory, to move there

cd /home/<username> (ex. /home/ameuser)

Now try moving back to scratch and then move up one using the relative path. Also, you can use tab to autofill a directory name. Try typing "scr" then hitting tab to autofill the rest of "scratch".

cd scratch

cd ..

Where are we now? .. means the previous directory, while . means the current working directory. Because your home directory is a special starting place, there's also a short cut specific to it. Try going into scratch again, then using the ~ as a short cut.

cd scratch

cd ~

Let's create a document in our scratch folder, using my go-to text editor, nano. There are other ones, but this is what I use.

cd scratch

nano test.txt

Type some stuff in it, then save with Ctrl+S, and exit with Ctrl+X. View the file with head and less (tail does the opposite of head, but it's not obvious for small files).

head test.txt

less test.txt (hit q to exit)

What's the difference between these two commands?

To create a directory, use mkdir

mkdir stuff

cd stuff

pwd

We can delete this file with rm, and use rm -r for deleting directories. Be careful with rm!! There's no recycling bin. When you delete things, they're gone.

cd ..

rm test.txt

rm -r stuff

To run jobs on the command line, we want to start an interactive session aka move to a compute node. These settings give us one hour and 4MB of memory.

srun --pty --account="def-mcfarlas" -t 0-01:00:00 --mem=4000 /bin/bash

We can now run scripts on the command line

module load r

Rscript plotting_DIC.R EGM19_cc_allSpecies_DIC_trimmed.txt

Yay it should have created some PDFs!!

We can look at the startingvals_loop_queuesub.sh script and talk about what an sbatch script needs to have to be submitted to the slurm scheduler.

Another tip that I haven't mentioned is that a single right click will paste anything that's in your clipboard

Scripts used in demo!!

plotting_DIC.R

# script for plotting DIC values from entropy runs
# Amanda Meuser -- March 2023

# USAGE: Rscript plotting_DIC.R file 

# load packages
print("Loading packages...")
library(tidyverse)
library(tools)

# import DIC file
args <- commandArgs(TRUE)
file <- args[1]
DIC <- read.delim(file, header=T, sep = "\t")

print("Dimensions of file:")
dim(DIC)
print("Here's a sneak peak:")
head(DIC)

# average across the 3 reps
DIC_avg <- DIC %>% group_by(k) %>% summarize_all(mean)

# remove useless rep column
DIC_avg <- DIC_avg[,-2]

print("Dimensions of file:")
dim(DIC_avg)
print("Here's a sneak peak:")
head(DIC_avg)

print("The optimal value of k is:")
DIC_avg$k[which.min(DIC_avg$Model_DIC)]

basename1 <- basename(file)
basename <- file_path_sans_ext(basename1)
print("Basename is:")
basename

print("Creating plots...")
pdf(paste0(basename,"_plot.pdf"), width=11, height=11)
    ggplot(DIC_avg, aes(k, Model_DIC)) + geom_point()
dev.off()


pdf(paste0(basename,"_plot_extra.pdf"), width=11, height=11)
par(mfrow=c(1,3)) 

    ggplot(DIC_avg, aes(k, Model_deviance)) + geom_point()
    ggplot(DIC_avg, aes(k, Effective_number_of_parameters)) + geom_point()
    ggplot(DIC_avg, aes(k, Model_DIC)) + geom_point()

dev.off()

EGM19_cc_allSpecies_DIC_trimmed.txt

k	rep	Model_deviance	Effective_number_of_parameters	Model_DIC
k1	rep1	116723.64	46280.78	163004.42
k1	rep2	116715.08	44934.97	161650.04
k1	rep3	116701.76	51221.82	167923.58
k2	rep1	93934.44	111045.87	204980.31
k2	rep2	93969.86	90658.80	184628.66
k2	rep3	94055.12	115396.61	209451.72
k3	rep1	93594.66	70059.74	163654.40
k3	rep2	93537.24	74041.86	167579.10
k3	rep3	92376.52	70654.95	163031.47
k4	rep1	91159.48	104401.82	195561.30
k4	rep2	91155.57	125011.60	216167.18
k4	rep3	91292.55	124920.52	216213.07
k5	rep1	89198.25	81986.15	171184.40
k5	rep2	89371.42	90499.67	179871.09
k5	rep3	89965.51	94692.05	184657.56

startingvals_loop_queuesub.sh

##USAGE: sbatch startingvals_loop_queuesub.sh k rep /path/to/mpgl_and_ldak
##example: sbatch ../../startingvals_loop_queuesub.sh 8 3 /project/rrg-emandevi/hybrid_ameuser/AMP22/starting_values_entropy
    ## Tips:
    ## modify k and starting values script/path on command line
    ## do NOT put a / at the end of the path on command line

### ---------- Job configuration --------------------------------------------

# Run dependent and permanent parameters
# will be run on complete nodes NOT partial

#SBATCH --nodes=1                       # number of nodes to use            
#SBATCH --time=00-0:15:00 		        # time (DD-HH:MM:SS)
#SBATCH --account=rrg-emandevi          # account name
#SBATCH --job-name="loop_script"            # name to display in queue
#SBATCH --ntasks-per-node=1             # taks per node (one core per node)
#SBATCH --mem=4000M                     # memory per node
#SBATCH [email protected] # who to email
#SBATCH --mail-type=ALL                 # when to email



for k in $(seq 5 $1) ##start at 5, sequentially go up by 1 till it gets to value of $1
do
    for rep in $(seq 1 $2)
    do 
        ##echo $k $rep #was for testing!
        sbatch /project/rrg-emandevi/hybrid_ameuser/entropy_queuesub_27jul22.sh $3/*.mpgl $k $rep $3/qk"$k"inds.txt  
    done
done

Bash Tutorial on Compute Canada - erynmcfarlane/StatsGenLabProtocols GitHub Wiki

PowerPoint about Compute Canada

Demo on Cedar

Scripts used in demo!!

plotting_DIC.R

EGM19_cc_allSpecies_DIC_trimmed.txt

startingvals_loop_queuesub.sh

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️