Outline TotalRNA Commands - SabaLab/RNASeq_Scripts GitHub Wiki

Outline of commands run to process TotalRNA.

Trimming script will perform these steps:
1. Count Raw reads - For each Raw File:
  gunzip -c $file | wc -l > $COUNTFILE
2. Summarize individual count files - For each Raw Count File:
  echo $file"\n" >> $RAWSUMMARY cat $file >> $RAWSUMMARY
3. Run cutadapt - For each Pair of Raw Files:
  cutadapt -q 20 -m 20 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -o $TRIMMEDFILE -p $TRIMMEDFILE2 $fileR1 $fileR2 > $TRIMMEDFILEOUT 2>&1
4. Count reads and calculate avg length - For each trimmed fq file
  gunzip -c $file | awk '/@(GWZHISEQ|HWI|SL-HBW|HISEQ|D00|K00)/ {getline; print length($0)}' | awk -v sample="$file" '{sum+=$1} END {print sample,sum/NR,NR}' > $COUNTFILE
5. Summarize count files - For each trimmed count file:
  cat $file >> $TRIMMEDSUMMARY
Aligning to rRNA to Cleanup Reads will perform:
1. Align to rRNA, unmapped will be used for further processing - For each pair of files:
  bowtie2 -q -p $MAX -k 2 --very-sensitive --reorder --mp 10,4 --fr -t -x $INDEX -1 $FileR1 -2 $FileR2 2> $OUTPATH/alignSummary.txt | tee >(samtools view -bS -F 0x4 - > $OUTPATH/aligned.bam 2> /dev/null) >(samtools view -bS -f 0x4 - > $OUTPATH/unmapped.bam 2> /dev/null) | grep errors
2. Convert to compressed paired fastq files while removing reads whose mate aligned to rRNA - For each sample:
  bamToFastQ_py2.sh $OUTPATH unmapped.bam
  - Convert to right and left sam files to read in python:
    samtools view -f 0x0040 $UNMAPPEDBAMFILE > $RIGHT
    samtools view -f 0x0080 $UNMAPPEDBAMFILE > $LEFT
  - Create unmapped.end#.fq.gz files:
    python filterUnpaired.py $LEFT $RIGHT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end1.fq.gz
    python filterUnpaired.py $RIGHT $LEFT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end2.fq.gz
  - Count both files:
    gunzip -c $OUTPUT/unmapped.end#.fq.gz | wc -l >> $COUNTFILE
Previous Method for aligning rRNA to cleanup reads was(used for all batches before spring 2017):
1. Align to rRNA, unmapped will be used for further processing - For each pair of files:
  hisat2 -q -p $MAX -k 2 --reorder --no-mixed --no-discordant --rna-strandness FR --novel-splicesite-outfile $OUTPATH/splice_junct.bed -x $INDEX -S $OUTPATH/output.sam -1 $FileR1 -2 $FileR2
2. Convert sam to aligned/unmapped bam files:
  samtools view -bS -F 0x4 $OPATH/output.sam > $OPATH/aligned.bam samtools view -bS -f 0x4 $OPATH/output.sam > $OPATH/unmapped.bam
3. Convert to compressed paired fastq files while removing reads whose mate aligned to rRNA - For each sample:
  bamToFastQ_py2.sh $OUTPATH unmapped.bam
  - Convert to right and left sam files to read in python:
    samtools view -f 0x0040 $UNMAPPEDBAMFILE > $RIGHT
    samtools view -f 0x0080 $UNMAPPEDBAMFILE > $LEFT
  - Create unmapped.end#.fq.gz files:
    python filterUnpaired.py $LEFT $RIGHT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end1.fq.gz
    python filterUnpaired.py $RIGHT $LEFT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end2.fq.gz
  - Count both files:
    gunzip -c $OUTPUT/unmapped.end#.fq.gz | wc -l >> $COUNTFILE