Analysis 1: Pre Trim Fast QC - cecilia-andersson/Genome_Analysis_Project GitHub Wiki
1. Copied soft link to data into folder
2. Ran fastqc with default parameters for paired end reads, took about 5 minutes
The RNA data is also flagged for having adapter sequences and "overrepresented sequences," which upon inspection are mostly adapter sequences. Though keeping adapters can have an impact on analyses down the line, I researched when trimming adapters is appropriate and decided not to do much trimming. This is because the purpose of the RNA-seq data in this study is to perform some expression analyses, in which case it would be detrimental to erroneously remove sequences which are overrepresented and not just adapters. Additionally, BWA-MEM performs some adapter trimming itself. (SORUCE: https://dnatech.genomecenter.ucdavis.edu/faqs/when-should-i-trim-my-illumina-reads-and-how-should-i-do-it/#:~:text=In%20case%20you%20are%20sequencing,pseudo%2Daligners%20should%20be%20used. , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671312/).
- DNA Quality:
- RNA (untrimmed) Quality:
data:image/s3,"s3://crabby-images/bf496/bf496f3c1f8354dffd46916a6fa621b6f5de8173" alt="Screen Shot 2023-05-16 at 3 25 28 PM"
DNA from site D1
data:image/s3,"s3://crabby-images/0ddd1/0ddd1b9939f61e7fbb28462eaf229f7066ddc0dc" alt="Screen Shot 2023-05-16 at 3 31 52 PM"
DNA from site D3
data:image/s3,"s3://crabby-images/55a13/55a1373a1bb6affe20fff00c69d53a8bf9f3056b" alt="Screen Shot 2023-05-16 at 3 36 46 PM"
RNA from site D1
RNA from site D3
The RNA data is also flagged for having adapter sequences and "overrepresented sequences," which upon inspection are mostly adapter sequences. Though keeping adapters can have an impact on analyses down the line, I researched when trimming adapters is appropriate and decided not to do much trimming. This is because the purpose of the RNA-seq data in this study is to perform some expression analyses, in which case it would be detrimental to erroneously remove sequences which are overrepresented and not just adapters. Additionally, BWA-MEM performs some adapter trimming itself. (SORUCE: https://dnatech.genomecenter.ucdavis.edu/faqs/when-should-i-trim-my-illumina-reads-and-how-should-i-do-it/#:~:text=In%20case%20you%20are%20sequencing,pseudo%2Daligners%20should%20be%20used. , https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671312/).
- What is the structure of a FASTQ file? A FASTQ file contains a header with the sequence ID, the raw sequence (ATCG), and quality values indicating the confidence level of each called base.
- How is the quality of the data stored in the FASTQ files? How are paired reads identified? The quality of the data is stored as Phred +33 encoded quality scores. Paired reads are identified by two FASTQ files for the sequence, usually noted as R1 (forward) and R2 (reverse). The reverse read has the same header as the forward read.
- What can generate the issues you observe in your data? Can these cause any problems during subsequent analyses? With illumina reads, sequencing issues can be generated by phasing (as discussed above), signal decay (also toward the ends of reads), and physical problems like overclustering and instrument errors.