FastQC - MattHuff/scRNASeq_011224 GitHub Wiki

I discussed the FastQC pipeline in my previous documentation. As there are no changes in the overall pipeline, this section will be brief.

Obtaining Raw Reads

The raw reads were downloaded to the Palmetto Cluster from Azenta. The overall project directory name, under my main directory in MUSC's branch of the Palmetto Cluster, is Jordan_scRNASeq_011024. Within this directory, I stored the raw reads in a directory named raw_data, and all steps of the analysis are in their own analysis directory.

1. FastQC

Within the analysis directory, create a new sub-directory for FastQC:

mkdir 1_fastqc
cd 1_fastqc

Using your text editor of choice, paste the following into a file named run_fastqc.qsh:

#!/bin/bash

#PBS -N 1_Fastqc
#PBS -l walltime=03:00:00
#PBS -j oe

source ~/.bashrc
mamba activate fastqc
cd $PBS_O_WORKDIR

for f in /zfs/musc3/huffmat/Jordan_scRNASeq_011024/raw_data/*.fastq.gz
do
	filename=$(basename "$f")
	base="${filename%%.fastq*}"
	echo "filename $filename base $base"
	mkdir -p $base.fastQC

	fastqc -o $base.fastQC --threads 10 $f >& $base.fastQC.out
done

Run qsub run_fastqc.qsh to queue this job on the server. For me, it took just under the 3 hours to run all four files. If this gets killed before the jobs are finished, you may want to try running them individually in an interactive session (launched with qsub -I.)

MultiQC

Make sure your mamba environment is active with mamba activate fastqc. This can be run in an interactive session or on its own. Remember to rename the standard output to something more informative.

multiqc .
mv multiqc_report.html Jordan_scRNASeq_raw_multiqc_report.html