Daily log - Sara-SL/GenomeAnalysis GitHub Wiki

30-03-2020

6h

  • Finished my project plan that I had started working on the previous week.
  • Created my first working node to se how it worked.
  • Cloned GitHub repository to folder in UPPMAX home/sarasl/git/GenomeAnalysis and created folder Data/ with a softlink to the data that had already been downloaded and some also trimmed. The data is only a subset (one contig) of the whole genome.

31-03-2020

2h

  • Organized my wiki and created all pages
  • Realized that I had created symbolic links to /proj/g2020008/2_Eckalbar_2016/sel4 and not /proj/g2020008/nobackup/private/2_Eckalbar_2016/sel4 so I learned how to remove soft links and created a new one. Also added softlink to sel4_NW_015503979.fna.gz which is the contig that correspond to the selected subset of the reads(sel4).

2h

  • Tried to figure out how to run fastqc
  • Run fastQC on wgs_data by creating a script

01-04-2020

1h

  • renamed the wgs_data and rna_seq_data(raw & trimmed) by creating new softlinks. Renamed by hand.

02-04-2020

1h

  • run FastQC on wgs_data, raw rna_seq_data and trimmed rna_seq_data by using the fastqc*.sh scripts
  • started looking at the output files.

03-04-2020

2,5h

  • Revised my project plan after feedback
  • Starting writing method for QC

4h

  • Run MultiQC on fastqc_wgs, trimmed fastqc_rna_seq and raw fastqc_rna_seq
  • Trimmed raw data using Trimmomatic
  • run FastQC on trimmed raw data
  • Compared FastQC output of the raw rna_seq data and the newly trimmed rna_seq, looked better.

07-04-2020

3h

  • Started looking into how to write the config file to the SOAPdenovo software.

14-04-2020

2h

  • Started writing on the result and discussion on the 1.Preprocessing page

15-04-2020

4h

  • Wrote SOAPdenovo config file and tried to run SOAPdenovo-127mer didn't work

16-04-2020

0,5h

  • Run SOAPdenovo-63mer and pushed result to git. Took long time to push to git since big files. Some files could not be pushed since they were "larger than GitHub's recommended maximum file size of 50.00 MB" or "exceeds GitHub's file size limit of 100.00 MB".

17-04-2020

4h

  • Run Nucmer and tried to run mummerplot but didn't work for some reason

21 -04-2020

0,5h

  • Run MUMmerplot job with increased batch job time (took 3,5 h) and looked at output plot, looked weird.

23-04-2020

2,5h

  • Run different MUMmerplot jobs and rearranged in my repository to get a better structure for multiple runs and outputs.

4h

  • Run MUMmerplot with --layout and --filter and finally got a good result
  • Wrote a python script to clean the data of short contigs
  • Cleaned the .contig file and the .scafSeq file using the python script.
  • Run Nucmer and MUMmerplot on the cleaned data
  • Looked at the plots from MUMmerplot

25-04-2020

3h

  • Wrote method and tart of result on 2. DNA assembly wiki page

28-04-2020

4h

  • Planed to run trinity but ended up running Bowtie and Tophoat instead to later use the output in trinity as genome_guided trinity.

29-04-2020

4h

  • runt trinity with the output from tophat
  • started looking into how to run Maker
  • tried to run the first step about CEGMA in this tutorial - didn't work.

04-05-2020

4h

  • tried to run the first step about CEGMA in this tutorial again - didn't work.
  • started looking in to this tutorial instead
  • run the steps 1-5 in that tutorial

08-05-2020

4h

  • Since UPPMAX was down I couldn't run softwares. Instead I prepared as much as I could to be able to run step 6 as soon as UPPMAX was up and running again. E.g. created the different perl scripts and a batch script to run everything in step 6.
  • worked on my wiki

10-05-2020

  • UPPMAX still down...

11-05-2020

4h

  • Started looking into EggNOG and htseq a bit
  • UPPMAX started working again, so I kept running maker

14-05-2020

4h

  • Struggled with running maker

15-05-2020

4h

  • Tried to figure out how to run htseq
  • Got a lot of errors but eventually succeed to run Htseq after I had separated paired and single end reads.

18-05-2020

7h

  • Tried to figure out how to interpret my htseq output
  • Wrote a python script to count features
  • Run tophat for each sample (18 times)

20-05-2020

7h

  • Wrote R script for differential expression analysis

24-05-2020

3,5 h

  • Worked on my wiki