Daily log - Sara-SL/GenomeAnalysis GitHub Wiki
30-03-2020
6h
- Finished my project plan that I had started working on the previous week.
- Created my first working node to se how it worked.
- Cloned GitHub repository to folder in UPPMAX home/sarasl/git/GenomeAnalysis and created folder Data/ with a softlink to the data that had already been downloaded and some also trimmed. The data is only a subset (one contig) of the whole genome.
31-03-2020
2h
- Organized my wiki and created all pages
- Realized that I had created symbolic links to /proj/g2020008/2_Eckalbar_2016/sel4 and not /proj/g2020008/nobackup/private/2_Eckalbar_2016/sel4 so I learned how to remove soft links and created a new one. Also added softlink to sel4_NW_015503979.fna.gz which is the contig that correspond to the selected subset of the reads(sel4).
2h
- Tried to figure out how to run fastqc
- Run fastQC on wgs_data by creating a script
01-04-2020
1h
- renamed the wgs_data and rna_seq_data(raw & trimmed) by creating new softlinks. Renamed by hand.
02-04-2020
1h
- run FastQC on wgs_data, raw rna_seq_data and trimmed rna_seq_data by using the fastqc*.sh scripts
- started looking at the output files.
03-04-2020
2,5h
- Revised my project plan after feedback
- Starting writing method for QC
4h
- Run MultiQC on fastqc_wgs, trimmed fastqc_rna_seq and raw fastqc_rna_seq
- Trimmed raw data using Trimmomatic
- run FastQC on trimmed raw data
- Compared FastQC output of the raw rna_seq data and the newly trimmed rna_seq, looked better.
07-04-2020
3h
- Started looking into how to write the config file to the SOAPdenovo software.
14-04-2020
2h
- Started writing on the result and discussion on the 1.Preprocessing page
15-04-2020
4h
- Wrote SOAPdenovo config file and tried to run SOAPdenovo-127mer didn't work
16-04-2020
0,5h
- Run SOAPdenovo-63mer and pushed result to git. Took long time to push to git since big files. Some files could not be pushed since they were "larger than GitHub's recommended maximum file size of 50.00 MB" or "exceeds GitHub's file size limit of 100.00 MB".
17-04-2020
4h
- Run Nucmer and tried to run mummerplot but didn't work for some reason
21 -04-2020
0,5h
- Run MUMmerplot job with increased batch job time (took 3,5 h) and looked at output plot, looked weird.
23-04-2020
2,5h
- Run different MUMmerplot jobs and rearranged in my repository to get a better structure for multiple runs and outputs.
4h
- Run MUMmerplot with --layout and --filter and finally got a good result
- Wrote a python script to clean the data of short contigs
- Cleaned the .contig file and the .scafSeq file using the python script.
- Run Nucmer and MUMmerplot on the cleaned data
- Looked at the plots from MUMmerplot
25-04-2020
3h
- Wrote method and tart of result on 2. DNA assembly wiki page
28-04-2020
4h
- Planed to run trinity but ended up running Bowtie and Tophoat instead to later use the output in trinity as genome_guided trinity.
29-04-2020
4h
- runt trinity with the output from tophat
- started looking into how to run Maker
- tried to run the first step about CEGMA in this tutorial - didn't work.
04-05-2020
4h
- tried to run the first step about CEGMA in this tutorial again - didn't work.
- started looking in to this tutorial instead
- run the steps 1-5 in that tutorial
08-05-2020
4h
- Since UPPMAX was down I couldn't run softwares. Instead I prepared as much as I could to be able to run step 6 as soon as UPPMAX was up and running again. E.g. created the different perl scripts and a batch script to run everything in step 6.
- worked on my wiki
10-05-2020
- UPPMAX still down...
11-05-2020
4h
- Started looking into EggNOG and htseq a bit
- UPPMAX started working again, so I kept running maker
14-05-2020
4h
- Struggled with running maker
15-05-2020
4h
- Tried to figure out how to run htseq
- Got a lot of errors but eventually succeed to run Htseq after I had separated paired and single end reads.
18-05-2020
7h
- Tried to figure out how to interpret my htseq output
- Wrote a python script to count features
- Run tophat for each sample (18 times)
20-05-2020
7h
- Wrote R script for differential expression analysis
24-05-2020
3,5 h
- Worked on my wiki