1D Short Read Quality Control and Trimming - NU-CPGME/sl_workshop_2024 GitHub Wiki
July / August, 2024
Developed by:
Egon A. Ozer, MD PhD ([email protected])
Ramon Lorenzo Redondo, PhD ([email protected])
Before we start:
micromamba activate assembly
Using cd
, navigate to the folder with our example data, demo_data
, and list the contents with ls
.
cd ~/sl_workshop_2024/demo_data
ls
You should see this as the output of your ls
command:
Alignments MLTrees Phylodynamics nanopore_reads phylo_data reads reference
Section 1 - Read quality control with FastQC

FastQC provides quality metrics for read files and shows the output in graphical and text formats.
Commands
fastqc reads/GAS_1.fastq.gz reads/GAS_2.fastq.gz
Outputs
Files | Description |
---|---|
GAS_1_fastqc.html | Read characteristics in graphical format. Can be opened with a web browser like Chrome |
GAS_1_fastqc.zip | Zip file containing results in text versions |
More detailed information about how to interpret the results can be found in the manual: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/
Before we assemble, we'll do some very light read trimming. This step removes any Illumina adapter sequences that may have made it into the reads due to read-through of short library fragments. These adapter sequences can sometimes get incorporated into your assembly and cause misassemblies. Better to take the time to remove them prior to assembly.
We are going to use fastp to perform the trimming step. Fastp can perform a number of functions including adapter removal as well remove low-quality sequences from the reads.
Commands
fastp \
--in1 reads/GAS_1.fastq.gz \
--in2 reads/GAS_2.fastq.gz \
--out1 GAS_trimmed_paired_1.fastq.gz \
--out2 GAS_trimmed_paired_2.fastq.gz \
--unpaired1 GAS_trimmed_unpaired_1.fastq.gz \
--unpaired2 GAS_trimmed_unpaired_2.fastq.gz \
-h GAS_fastp.html \
-j GAS_fastp.json \
-w 1
Settings Used
Setting | Descripton |
---|---|
--in1 / --in2 |
Input read files |
--out1 / --out2 |
Paired output read files |
--unpaired1 / --unpaired2 |
Singleton read files (only one of the paired reads passed filters) |
-h |
Filtering and trimming report, html format |
-j |
Filtering and trimming reoprt, json format |
-w |
Number of parallel threads to use |
See fastp manual for more detail on settings and other options.
Outputs
Files | Description |
---|---|
GAS_trimmed_paired_1.fastq.gz & _2.fastq.gz
|
Paired reads remaining after trimming |
GAS_trimmed_unpaired_1.fastq.gz & _2.fastq.gz
|
Unpaired reads remaining after trimming |
GAS_fastp.html |
Filtering and trimming report. Can view with Firefox or other web browser |
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.