Working Southern Appalachian Brook Trout GTseq pipeline - RebeccaJSmith89/SABT_bioinformatics GitHub Wiki

Reference file

  1. Go to the directory you would like to work in, on the super computer. Locate your reference file or scp it to this directory from your home computer
    • To move use the scp command:

scp "C:/Users/BeccaJo/[Inserte home file location]"[[email protected]:GT_seq/]

  • For PC- you need the "" around the file you are moving, then a : inbetween the supercomputer and pathway to where you want it stored
  1. change .txt file of references to .fasta
  2. index reference sequences using BWA

module load BWA/0.7.17-GCCcore-11.3.0 #load BWA to index file ``

Run index command

bwa index [reference fasta file] # Output will yeild several files in the folder; we will use this later for pile up

load bowtie2 to build reference file

module load Bowtie2/2.4.5-GCC-11.3.0

#build command to build reference file / prepare for alignment

bowtie2-build [reference fasta file] [output name]

aligning to reference

simple code for 1 file

bowtie2 -p 32 -x refhalf -U fastqs/initialSfoVARI24G_0001.fastq -S SamsO/Sf0001.sam

  • we wrote an Slurm file with a **loop **to align all the references to our reads
  • make sure you are in the directory where you want to execute the commands
  • I nanoed in the slurm file by nano [slurm file name].run *Then tell rocky to run the command with Sbatch SBatch [filenamme].run

squeue # to see if its running

  • then we changed our sam files to bam files and each bam files gets a bai which is an index [need output for next step?]

  • Use slurm script Bam sort and run loop on all

  • or use bamsam.run nano script- shouldnt need to change any file names

  • To combine first plate with new plates / sequences we want to combine reads in this Pileup stage: -a folder with .bam & .bai -then move onto pileup script

  • next we want to 'pile up' all the reads from each individual and we will end with a VCF file which we can use in R