1D Short Read Quality Control and Trimming - NU-CPGME/sl_workshop_2024 GitHub Wiki

July / August, 2024

Developed by:
Egon A. Ozer, MD PhD ([email protected])
Ramon Lorenzo Redondo, PhD ([email protected])


Before we start:

micromamba activate assembly

Using cd, navigate to the folder with our example data, demo_data, and list the contents with ls.

cd ~/sl_workshop_2024/demo_data
ls

You should see this as the output of your ls command:

Alignments     MLTrees        Phylodynamics  nanopore_reads phylo_data     reads          reference

Section 1 - Read quality control with FastQC

FastQC provides quality metrics for read files and shows the output in graphical and text formats.

Commands

fastqc reads/GAS_1.fastq.gz reads/GAS_2.fastq.gz

Outputs

Files Description
GAS_1_fastqc.html Read characteristics in graphical format. Can be opened with a web browser like Chrome
GAS_1_fastqc.zip Zip file containing results in text versions

More detailed information about how to interpret the results can be found in the manual: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/

Section 2 - Read trimming

Before we assemble, we'll do some very light read trimming. This step removes any Illumina adapter sequences that may have made it into the reads due to read-through of short library fragments. These adapter sequences can sometimes get incorporated into your assembly and cause misassemblies. Better to take the time to remove them prior to assembly.

We are going to use fastp to perform the trimming step. Fastp can perform a number of functions including adapter removal as well remove low-quality sequences from the reads.

Commands

fastp \
    --in1 reads/GAS_1.fastq.gz \
    --in2 reads/GAS_2.fastq.gz \
    --out1 GAS_trimmed_paired_1.fastq.gz \
    --out2 GAS_trimmed_paired_2.fastq.gz \
    --unpaired1 GAS_trimmed_unpaired_1.fastq.gz \
    --unpaired2 GAS_trimmed_unpaired_2.fastq.gz \
    -h GAS_fastp.html \
    -j GAS_fastp.json \
    -w 1

Settings Used

Setting Descripton
--in1 / --in2 Input read files
--out1 / --out2 Paired output read files
--unpaired1 / --unpaired2 Singleton read files (only one of the paired reads passed filters)
-h Filtering and trimming report, html format
-j Filtering and trimming reoprt, json format
-w Number of parallel threads to use

See fastp manual for more detail on settings and other options.

Outputs

Files Description
GAS_trimmed_paired_1.fastq.gz & _2.fastq.gz Paired reads remaining after trimming
GAS_trimmed_unpaired_1.fastq.gz & _2.fastq.gz Unpaired reads remaining after trimming
GAS_fastp.html Filtering and trimming report. Can view with Firefox or other web browser


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

⚠️ **GitHub.com Fallback** ⚠️