Lab: Quality Assessment - statonlab/EPP_575_RNASeq_Workshop GitHub Wiki

Software

FastQC Website

MultiQC Website

Quality assessment of read files

All raw data will be located in /pickett_shared/teaching/EPP575_Jan2022/raw_data/.

To confirm, run:

ls /pickett_shared/teaching/EPP575_Jan2022/raw_data/

You should see 8 files. Does that mean we have 8 samples?

Symbolic links

These files are big, and copying each file will use up too much memory for our system. Rather than copying files to your directory, I recommend creating a symbolic link.

Navigate to /pickett_shared/teaching/EPP575_Jan2022/analysis, and create a directory with your UTK user name; this is where you will store your output files.

mkdir <your_username>
cd <your_username>

Within this new directory, create a sub-directory named raw_data. Within this directory, run the command:

ln -s /pickett_shared/teaching/EPP575_Jan2022/raw_data/SRR17062759_1.fastq

This creates a symbolic link to the file; rather than creating a hard duplicate, this command creates a different type of file that points to the original file.

Navigate back up to your main directory, and create a new sub-directory named analysis. Within this directory, create a sub-directory to hold the first step of our analysis:

mkdir 1_fastqcRaw
cd 1_fastqcRaw

FastQC is not available by default on Sphinx; load it with the following command:

spack load [email protected]%[email protected]

This is an alternative to the way Meg discussed loading Spack packages on Friday - in this case, it woudl look like spack load /wrz2q7j; use whichever method you prefer.

Test that fastqc loaded properly for you. What message pops up if you just run fastqc? How about fastqc -h?

To run fastqc on your data, run the following:

mkdir SRR17062759_1.fastQC
fastqc -o SRR17062759_1.fastQC ../../raw_data/SRR17062759_1.fastq >& SRR17062759_1.fastQC.out

This creates an HTML file that is unable to be viewed on Terminal. Using the scp command, copy this file to your personal computer to open the HTML file for viewing.

scp <your_username>@sphinx.ag.utk.edu:/pickett_shared/teaching/EPP575_Jan2022/analysis/<your_username>/analysis/1_fastqcRaw/SRR17062759_1.fastQC/SRR17062759_1_fastqc.html .

Challenge

We have performed quality assessment on one read pair file for sample SRR17062759. Repeat this for the second read pair file.

MultiQC

Once you have both FastQC html files, we can run MultiQC to aggregate our results. Load it with the following command:

spack load [email protected]%[email protected]

In the same directory you ran FastQC, run the following command:

multiqc .

What is the importance of the . in this command?

Once it has finished running, you will have a file in your 1_fastqcRaw directory named multiqc_report.html. This is the default file name of every run of MultiQC; to avoid overwriting older MultiQC reports, I recommend renaming the file:

mv multiqc_report.html EPP575_raw_multiqc_report.html

Assignment

Send the file labeled EPP575_raw_multiqc_report.html to [email protected] and [email protected]. This file must contain quality assessment information of both read pairs for sample SRR17062759.