Analysis 2: Trimmomatic - cecilia-andersson/Genome_Analysis_Project GitHub Wiki
As inputs, I used both forward and reverse read files for each site (sites trimmed separately). The outputs I received were four trimmed fasta files for each location: D1 paired and unpaired (forward and reverse), and D3 paired and unpaired (forward and reverse).
Parameters:
PE (selected for paired-end reads)
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepbothreads (to trim Illumina-specific adapters and keep both reads)
LEADING: 3
TRAILING: 3
MINLEN: 36
FASTQC Analysis:
D1 Paired forward
data:image/s3,"s3://crabby-images/bbf7c/bbf7c138b6fa6be344140ae72d1922b4b361fbf0" alt="Screen Shot 2023-05-16 at 4 14 20 PM"
Parameters:
PE (selected for paired-end reads)
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepbothreads (to trim Illumina-specific adapters and keep both reads)
LEADING: 3
TRAILING: 3
MINLEN: 36
FASTQC Analysis:
data:image/s3,"s3://crabby-images/ada9a/ada9ad825769642f0e44b91cd753e757951abda1" alt="Screen Shot 2023-05-16 at 4 14 49 PM"
D1 Paired forward
data:image/s3,"s3://crabby-images/bbf7c/bbf7c138b6fa6be344140ae72d1922b4b361fbf0" alt="Screen Shot 2023-05-16 at 4 14 20 PM"
D3 Paired forward
- How many reads have been discarded after trimming? In location D1: 3913 reads. Location D3: 3089 reads.
- How can this affect your future analyses and results? On the negative side, eliminating data can potentially reduce the amount of information you can draw from further analyses, especially if you can choose to sort out certain poor quality in these analyses themselves. On the positive side, eliminating sequences that are very likely to be faulty and may skew further analyses, is important to reduce amplified errors.
- How is the quality of your data after trimming? After trimming, the quality of the RNA data improved significantly according to FastQC's analyses. Per base sequence quality improved for all four (D1 forward, D1 reverse, D3 forward, D3 reverse) samples.
- What do the LEADING, TRAILING and SLIDINGWINDOW options do? The LEADING option trims bases below a chosen quality level at the beginning of the sequence, whereas the TRAILING option trims bases below a selected quality score level at the end of a sequence. The SLIDINGWINDOW option considers a window of several bases at a time, and removes parts of sequence when average quality within that span of bases falls below a chosen threshold.