Daily log - Sara-SL/GenomeAnalysis GitHub Wiki

30-03-2020

Finished my project plan that I had started working on the previous week.
Created my first working node to se how it worked.
Cloned GitHub repository to folder in UPPMAX home/sarasl/git/GenomeAnalysis and created folder Data/ with a softlink to the data that had already been downloaded and some also trimmed. The data is only a subset (one contig) of the whole genome.

31-03-2020

Organized my wiki and created all pages
Realized that I had created symbolic links to /proj/g2020008/2_Eckalbar_2016/sel4 and not /proj/g2020008/nobackup/private/2_Eckalbar_2016/sel4 so I learned how to remove soft links and created a new one. Also added softlink to sel4_NW_015503979.fna.gz which is the contig that correspond to the selected subset of the reads(sel4).

Tried to figure out how to run fastqc
Run fastQC on wgs_data by creating a script

01-04-2020

renamed the wgs_data and rna_seq_data(raw & trimmed) by creating new softlinks. Renamed by hand.

02-04-2020

run FastQC on wgs_data, raw rna_seq_data and trimmed rna_seq_data by using the fastqc*.sh scripts
started looking at the output files.

03-04-2020

2,5h

Revised my project plan after feedback
Starting writing method for QC

Run MultiQC on fastqc_wgs, trimmed fastqc_rna_seq and raw fastqc_rna_seq
Trimmed raw data using Trimmomatic
run FastQC on trimmed raw data
Compared FastQC output of the raw rna_seq data and the newly trimmed rna_seq, looked better.

07-04-2020

Started looking into how to write the config file to the SOAPdenovo software.

14-04-2020

Started writing on the result and discussion on the 1.Preprocessing page

15-04-2020

Wrote SOAPdenovo config file and tried to run SOAPdenovo-127mer didn't work

16-04-2020

0,5h

Run SOAPdenovo-63mer and pushed result to git. Took long time to push to git since big files. Some files could not be pushed since they were "larger than GitHub's recommended maximum file size of 50.00 MB" or "exceeds GitHub's file size limit of 100.00 MB".

17-04-2020

Run Nucmer and tried to run mummerplot but didn't work for some reason

21 -04-2020

0,5h

Run MUMmerplot job with increased batch job time (took 3,5 h) and looked at output plot, looked weird.

23-04-2020

2,5h

Run different MUMmerplot jobs and rearranged in my repository to get a better structure for multiple runs and outputs.

Run MUMmerplot with --layout and --filter and finally got a good result
Wrote a python script to clean the data of short contigs
Cleaned the .contig file and the .scafSeq file using the python script.
Run Nucmer and MUMmerplot on the cleaned data
Looked at the plots from MUMmerplot

25-04-2020

Wrote method and tart of result on 2. DNA assembly wiki page

28-04-2020

Planed to run trinity but ended up running Bowtie and Tophoat instead to later use the output in trinity as genome_guided trinity.

29-04-2020

runt trinity with the output from tophat
started looking into how to run Maker
tried to run the first step about CEGMA in this tutorial - didn't work.

04-05-2020

tried to run the first step about CEGMA in this tutorial again - didn't work.
started looking in to this tutorial instead
run the steps 1-5 in that tutorial

08-05-2020

Since UPPMAX was down I couldn't run softwares. Instead I prepared as much as I could to be able to run step 6 as soon as UPPMAX was up and running again. E.g. created the different perl scripts and a batch script to run everything in step 6.
worked on my wiki

10-05-2020

UPPMAX still down...

11-05-2020

Started looking into EggNOG and htseq a bit
UPPMAX started working again, so I kept running maker

14-05-2020

Struggled with running maker

15-05-2020

Tried to figure out how to run htseq
Got a lot of errors but eventually succeed to run Htseq after I had separated paired and single end reads.

18-05-2020

Tried to figure out how to interpret my htseq output
Wrote a python script to count features
Run tophat for each sample (18 times)

20-05-2020

Wrote R script for differential expression analysis

24-05-2020

3,5 h

Worked on my wiki