Outline TotalRNA Commands - SabaLab/RNASeq_Scripts GitHub Wiki

Outline of commands run to process TotalRNA.

  • Trimming script will perform these steps:

    1. Count Raw reads - For each Raw File:
      gunzip -c $file | wc -l > $COUNTFILE
    2. Summarize individual count files - For each Raw Count File:
      echo $file"\n" >> $RAWSUMMARY cat $file >> $RAWSUMMARY
    3. Run cutadapt - For each Pair of Raw Files:
      cutadapt -q 20 -m 20 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -o $TRIMMEDFILE -p $TRIMMEDFILE2 $fileR1 $fileR2 > $TRIMMEDFILEOUT 2>&1
    4. Count reads and calculate avg length - For each trimmed fq file
      gunzip -c $file | awk '/@(GWZHISEQ|HWI|SL-HBW|HISEQ|D00|K00)/ {getline; print length($0)}' | awk -v sample="$file" '{sum+=$1} END {print sample,sum/NR,NR}' > $COUNTFILE
    5. Summarize count files - For each trimmed count file:
      cat $file >> $TRIMMEDSUMMARY
  • Aligning to rRNA to Cleanup Reads will perform:

    1. Align to rRNA, unmapped will be used for further processing - For each pair of files:
      bowtie2 -q -p $MAX -k 2 --very-sensitive --reorder --mp 10,4 --fr -t -x $INDEX -1 $FileR1 -2 $FileR2 2> $OUTPATH/alignSummary.txt | tee >(samtools view -bS -F 0x4 - > $OUTPATH/aligned.bam 2> /dev/null) >(samtools view -bS -f 0x4 - > $OUTPATH/unmapped.bam 2> /dev/null) | grep errors
    2. Convert to compressed paired fastq files while removing reads whose mate aligned to rRNA - For each sample:
      bamToFastQ_py2.sh $OUTPATH unmapped.bam
      • Convert to right and left sam files to read in python:
        samtools view -f 0x0040 $UNMAPPEDBAMFILE > $RIGHT
        samtools view -f 0x0080 $UNMAPPEDBAMFILE > $LEFT
      • Create unmapped.end#.fq.gz files:
        python filterUnpaired.py $LEFT $RIGHT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end1.fq.gz
        python filterUnpaired.py $RIGHT $LEFT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end2.fq.gz
      • Count both files:
        gunzip -c $OUTPUT/unmapped.end#.fq.gz | wc -l >> $COUNTFILE
  • Previous Method for aligning rRNA to cleanup reads was(used for all batches before spring 2017):

    1. Align to rRNA, unmapped will be used for further processing - For each pair of files:
      hisat2 -q -p $MAX -k 2 --reorder --no-mixed --no-discordant --rna-strandness FR --novel-splicesite-outfile $OUTPATH/splice_junct.bed -x $INDEX -S $OUTPATH/output.sam -1 $FileR1 -2 $FileR2
    2. Convert sam to aligned/unmapped bam files:
      samtools view -bS -F 0x4 $OPATH/output.sam > $OPATH/aligned.bam samtools view -bS -f 0x4 $OPATH/output.sam > $OPATH/unmapped.bam
    3. Convert to compressed paired fastq files while removing reads whose mate aligned to rRNA - For each sample:
      bamToFastQ_py2.sh $OUTPATH unmapped.bam
      • Convert to right and left sam files to read in python:
        samtools view -f 0x0040 $UNMAPPEDBAMFILE > $RIGHT
        samtools view -f 0x0080 $UNMAPPEDBAMFILE > $LEFT
      • Create unmapped.end#.fq.gz files:
        python filterUnpaired.py $LEFT $RIGHT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end1.fq.gz
        python filterUnpaired.py $RIGHT $LEFT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end2.fq.gz
      • Count both files:
        gunzip -c $OUTPUT/unmapped.end#.fq.gz | wc -l >> $COUNTFILE