Outline TotalRNA Commands - SabaLab/RNASeq_Scripts GitHub Wiki
Outline of commands run to process TotalRNA.
-
Trimming script will perform these steps:
- Count Raw reads - For each Raw File:
gunzip -c $file | wc -l > $COUNTFILE
- Summarize individual count files - For each Raw Count File:
echo $file"\n" >> $RAWSUMMARY cat $file >> $RAWSUMMARY
- Run cutadapt - For each Pair of Raw Files:
cutadapt -q 20 -m 20 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACCCGTCCCGATCTCGTATGCCGTCTTCTGCTTG -A AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGCCGTATCATT -o $TRIMMEDFILE -p $TRIMMEDFILE2 $fileR1 $fileR2 > $TRIMMEDFILEOUT 2>&1
- Count reads and calculate avg length - For each trimmed fq file
gunzip -c $file | awk '/@(GWZHISEQ|HWI|SL-HBW|HISEQ|D00|K00)/ {getline; print length($0)}' | awk -v sample="$file" '{sum+=$1} END {print sample,sum/NR,NR}' > $COUNTFILE
- Summarize count files - For each trimmed count file:
cat $file >> $TRIMMEDSUMMARY
- Count Raw reads - For each Raw File:
-
Aligning to rRNA to Cleanup Reads will perform:
- Align to rRNA, unmapped will be used for further processing - For each pair of files:
bowtie2 -q -p $MAX -k 2 --very-sensitive --reorder --mp 10,4 --fr -t -x $INDEX -1 $FileR1 -2 $FileR2 2> $OUTPATH/alignSummary.txt | tee >(samtools view -bS -F 0x4 - > $OUTPATH/aligned.bam 2> /dev/null) >(samtools view -bS -f 0x4 - > $OUTPATH/unmapped.bam 2> /dev/null) | grep errors
- Convert to compressed paired fastq files while removing reads whose mate aligned to rRNA - For each sample:
bamToFastQ_py2.sh $OUTPATH unmapped.bam
- Convert to right and left sam files to read in python:
samtools view -f 0x0040 $UNMAPPEDBAMFILE > $RIGHT
samtools view -f 0x0080 $UNMAPPEDBAMFILE > $LEFT
- Create unmapped.end#.fq.gz files:
python filterUnpaired.py $LEFT $RIGHT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end1.fq.gz
python filterUnpaired.py $RIGHT $LEFT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end2.fq.gz
- Count both files:
gunzip -c $OUTPUT/unmapped.end#.fq.gz | wc -l >> $COUNTFILE
- Convert to right and left sam files to read in python:
- Align to rRNA, unmapped will be used for further processing - For each pair of files:
-
Previous Method for aligning rRNA to cleanup reads was(used for all batches before spring 2017):
- Align to rRNA, unmapped will be used for further processing - For each pair of files:
hisat2 -q -p $MAX -k 2 --reorder --no-mixed --no-discordant --rna-strandness FR --novel-splicesite-outfile $OUTPATH/splice_junct.bed -x $INDEX -S $OUTPATH/output.sam -1 $FileR1 -2 $FileR2
- Convert sam to aligned/unmapped bam files:
samtools view -bS -F 0x4 $OPATH/output.sam > $OPATH/aligned.bam samtools view -bS -f 0x4 $OPATH/output.sam > $OPATH/unmapped.bam
- Convert to compressed paired fastq files while removing reads whose mate aligned to rRNA - For each sample:
bamToFastQ_py2.sh $OUTPATH unmapped.bam
- Convert to right and left sam files to read in python:
samtools view -f 0x0040 $UNMAPPEDBAMFILE > $RIGHT
samtools view -f 0x0080 $UNMAPPEDBAMFILE > $LEFT
- Create unmapped.end#.fq.gz files:
python filterUnpaired.py $LEFT $RIGHT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end1.fq.gz
python filterUnpaired.py $RIGHT $LEFT | samtools fastq -n - | gzip > $OUTPUT/unmapped.end2.fq.gz
- Count both files:
gunzip -c $OUTPUT/unmapped.end#.fq.gz | wc -l >> $COUNTFILE
- Convert to right and left sam files to read in python:
- Align to rRNA, unmapped will be used for further processing - For each pair of files: