Daily Log - dogayalova/Genome-Analysis-1MB462-67615-VT2025 GitHub Wiki

  • 06/4 - Worked on the details of the project plan & got prepared for the lab session on 07/4 to run the quality check of the reads for genome assembly (FastQC) and perform the genome assembly of PacBio reads via Canu.
  • 07/4 - Worked on the PacBio reads genome assembly. Tried to run the .sh file and submit the job to UPPMAX. Since PacBio reads doesn't need trimming and Canu already checks for quality in the beginning of the analysis, I directly prepared my .sh file and submitted the job to UPPMAX Snowy cluster with job-id "uppmax2025-3-3". Second, I ran the Pre-Quality check with FastQC on Illumina reads for Illumina/Nanopore assembly (extra analyses). Finally, I created "code" file and "results" file in my main directory for this project ([doya3905@rackham3 Genome-Analysis-1MB462-67615-VT2025]$). Because when I try to push the code (.sh file) to github, the whole folder goes. Thus, I decided to separate codes and results and push the code folder in the end for all codes.
  • 08/4 - Worked on trimming of Illumina reads. Completed the run. With the given parameters, the trimmed sequences showed worse quality of the reads. Secondly, QUAST analysis of PacBio assembly was done. Thirdly, annotation to the PacBio assembly was done using Prokka. Additionally, I worked on the organization of the folders in my directory. I created general "results" and "code" folders and moved every file accordingly. I pushed the "code" folder to GitHub. Also, I worked on my writings in the analysis section on my GitHub wiki, giving the code blocks a more pleasant preview. Visualization of PacBio assembly annotation is made by Artemis. Synteny comparison by MUMmer dotplot is made for PacBio assembly with E.faecium and E.faecalis.
  • 09/4 - Wasn't happy with synteny comparison programs, so I tried mauve today. Also, I ran the Spades assembly with Illumina/Nanopore and then checked the quality with QUAST. The resulting assembly was in a very bad quality. But the Illumina reads before trimming were already in a better quality then the trimmed version. Thus, I concluded that I trimmed so harsh and made the reads so short.
  • 10/4 - Concluded that Illumina reads are already fine, so ran Spades with the untrimmed reads. But the quality of the untrimmed assembly was also in bad quality. So I decided to map the RNA reads to the PacBio assembly. Also, I ran Act synteny comparison for PacBio assembly. And also I changed the MUMmerplot from E.faecalis vs E.faecium to E.faecium and its ref assembly on NCBI Database. That way I found diagonal lines on my plot and I could further evaluate the quality of my assembly. Thus, I completed the genome assembly part. I will continue with RNA reads in next sessions.
  • 11/4 - Worked on the Questions to get a 4 and 5.
  • 12/4 - Performed Pre FastQC check on the RNA reads, both BH and Serum. Didn't work, will continue.
  • 16/4 - Performed Pre FastQC for untrimmed and trimmed folders for Serum reads and for all data (all are trimmed) for BH reads. Edited my wiki for better classification of information.
  • 22/4 - Performed trimming on both BH and Serum RNA reads. Then, performed FastQC for the trimmed RNA reads. In addition, performed a BLAST on the command line to identify the plasmids produced with the Canu assembly.
  • 25/4 - The BLAST on command line produced a complicated file so I instead BLASTed on the NCBI website. This worked and I reported my plasmid identification findings on the wiki page.
  • 28/4 - Performed RNA read mapping for both BH and Serum samples using double trimmed paired end reads.
  • 2/5 - Prepared .sh files for differential expression analysis with HTseq, counting reads mapped to genes using HTseq-count. Different .sh files were prepared for BH and Serum samples. I also encountered disc space problems for Serum-Mapping and tried to solve that problem. After I obtain all my .bam files, I will run HTSeq analysis.
  • 4/5 - Worked on HTSeq and DESeq2 .sh files.
  • 5/5 - Corrected the scripts for HTSeq. Changed the conversion method of gff to gtf, to agat. Finally, I will work on DESeq2 tomorrow.
  • 6/5 - HTSeq doesn't give good results. So I tried using strand-reverse parameter in the analysis.
  • 7/5 - Finally understood what the problem with HTSeq analysis is. There is a FASTA file appended in the .gff output from prokka. So I removed that and used appropriate tags for gff, so I didn't need to convert .gff to .gtf. Hope the results turn out well.
  • 8/5 - HTSeq results turned out well, and meanwhile I worked on the interpretation of my results on the wiki.
  • 9/5 - Performed the final DTSeq analysis and obtained csv file along with a volcano plot and MA plot. Now, I will make sure to interpret all the results and compare to the paper. In todays lab, I'm working on these.
  • 14/5 - Since I finished all analysis and 2 extra, I'm working on my wiki since then.