Working Southern Appalachian Brook Trout GTseq pipeline - RebeccaJSmith89/SABT_bioinformatics GitHub Wiki
Reference file
- Go to the directory you would like to work in, on the super computer. Locate your reference file or scp it to this directory from your home computer
- To move use the scp command:
scp "C:/Users/BeccaJo/[Inserte home file location]"[[email protected]:GT_seq/]
- For PC- you need the "" around the file you are moving, then a : inbetween the supercomputer and pathway to where you want it stored
- change .txt file of references to .fasta
- index reference sequences using BWA
module load BWA/0.7.17-GCCcore-11.3.0
#load BWA to index file
``
Run index command
bwa index [reference fasta file]
# Output will yeild several files in the folder; we will use this later for pile up
load bowtie2 to build reference file
module load Bowtie2/2.4.5-GCC-11.3.0
#build command to build reference file / prepare for alignment
bowtie2-build [reference fasta file] [output name]
aligning to reference
simple code for 1 file
bowtie2 -p 32 -x refhalf -U fastqs/initialSfoVARI24G_0001.fastq -S SamsO/Sf0001.sam
- we wrote an Slurm file with a **loop **to align all the references to our reads
- make sure you are in the directory where you want to execute the commands
- I nanoed in the slurm file by
nano [slurm file name].run
*Then tell rocky to run the command with SbatchSBatch [filenamme].run
squeue
# to see if its running
-
then we changed our sam files to bam files and each bam files gets a bai which is an index [need output for next step?]
-
Use slurm script Bam sort and run loop on all
-
or use bamsam.run nano script- shouldnt need to change any file names
-
To combine first plate with new plates / sequences we want to combine reads in this Pileup stage: -a folder with .bam & .bai -then move onto pileup script
-
next we want to 'pile up' all the reads from each individual and we will end with a VCF file which we can use in R