Lab: ISAAC‐NG Intro - mestato/EPP622_2024 GitHub Wiki
Get set up on ISAAC
The following steps should already be complete, but just in case:
- Navigate to https://portal.acf.utk.edu/accounts/request
- Click on “I have a UT NetID”
- Authenticate with NetID, password and Duo two factor
- A form will then be presented with information collected from the University. There are at least two fields that the user needs to update: Salutation (Mr., Dr. etc.) Citizenship, and maybe another. Look for the required fields marked with an *
- Once the form is filled out click through to the next item
- Type that project name ISAAC-UTK0318 (with the alphabetic characters in uppercase) to request to be added. That should be it.
Notes
- You should never run jobs on the login node! It is only to set up your scripts to launch your jobs through the scheduler.
- Keep the documentation handy!
- The user portal will enable you to see what projects you are a part of and where you can store data and how much
- You should be a part of ISAAC-UTK0318 and see storage at /lustre/isaac/proj/UTK0138
- By default, SLURM scheduler assumes the working directory to be the directory from which the jobs are being submitted
Log into Isaac Next Gen. After your password, it will send you a Duo push.
ssh <yourusername>@login.isaac.utk.edu
The software system is just like spack, only the command is "module". (Its actually spack but the command module is what is used in the documentation). Lets see what is available.
module avail
BWA is something we have used before, let's see if its installed
module avail bwa
module load bwa
bwa
Go to the project directory
cd /lustre/isaac/proj/UTK0318/
You will see a directory set up for our practice. cd into it and create a directory for your lab
cd analysis
mkdir <yourusername>
cd <yourusername>
mkdir results
I've already downloaded our old solenopsis data, the genome, and indexed the genome with bwa.
Simple example - single sbatch command on the command line
You can load the software into your environment then run the job and it will inherit your environment (i.e. the software will still be loaded)
Lets just get a quick test command going
echo Worked! > results/output.txt
Lets try to run that through the scheduler.
sbatch -n 1 -N 1 -A ISAAC-UTK0318 -p condo-epp622 -q condo -t 00:01:00 --wrap="echo Worked! > results/sbatch_output.txt"
The flags tell the job scheduler all about your job, including what kind of resources it needs. This can be tricky - you need to know how many threads and how much RAM your job will take, so that you request a sufficient amount.
To see where your job is in the queue
squeue -u <yourusername>
Did it work? Do you see results/sbatch_output.txt
? What is the slurm file that got created?
Now let's try a real command. Increase time to 10min and let's try to run bwa mem through the scheduler with an sbatch
script.
Simple example - single command in an sbatch script
Its more typical and more readable to create a submission script
In simple-bwa.qsh, put
#!/bin/bash
#SBATCH -J bwa
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH -A ISAAC-UTK0318
#SBATCH -p condo-epp622
#SBATCH -q condo
#SBATCH -t 00:10:00
module load bwa
bwa mem \
-o results/SRR6922311.sam \
/lustre/isaac/proj/UTK0318/reference/solenopsis_invicta_genome_chr_3.fna \
/lustre/isaac/proj/UTK0318/raw_data/GBS_reads/SRR6922141_1.fastq
The directives at the top tell the job scheduler all about your job, just like the flags did. bwa mem by default only needs one thread, so we'll keep that.
Submit the script
sbatch simple-bwa.qsh
You can again track progress with squeue
and look at the output files.
Multiple jobs - single sbatch command inside a for loop
We learned about for loops in class, and we can use them here too.
Let's build a for loop in an sbatch script called loop-bwa.qsh:
#!/bin/bash
#SBATCH -J bwa_loop
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH -A ISAAC-UTK0318
#SBATCH -p condo-epp622
#SBATCH -q condo
#SBATCH -t 00:30:00
module load bwa
for FILE in /lustre/isaac/proj/UTK0318/raw_data/GBS_reads/*fastq
do
BASE=$( basename $FILE )
bwa mem \
-o results/${BASE%%.fastq}.sam \
/lustre/isaac/proj/UTK0318/reference/solenopsis_invicta_genome_chr_3.fna \
${FILE}
done
In this case, a for loop won't work if you want the jobs to run in parallel. If you put all the jobs in the background, the main script will complete, the scheduler will think you are done, and then your jobs will be killed before finishing. Instead, we are going to use a task array (next class).