4. Annotation - Sara-SL/GenomeAnalysis GitHub Wiki

Method

For structural annotation I planned to used the software Maker2. I followed this pipeline and in the AUGUSTUS section I used the perl scripts from this GitHub. Though when I got to the GeneMark step I got bad results so I planned to started over from the beginning using the contig from the paper (sel4_NW_015503979.fna) instead of my assembly (.contig file) and also add proteins to the maker_opts.ctl file. At that point there had been a lot of problem getting maker pipeline to work for everyone in our group and the ones that had already tried to use the contig from the paper and add proteins had not get it to work. Therefore the TAs were going to have a look at the maker pipeline to see if they could get it to work, but they never did so I never finished the structural annotation analysis. Instead I continued the functional analysis using the result from Trinity.

For functional annotation I used the online tool EggNOG. I uploaded the Trinity-GG.fasta file and used the settings Taxonomic Scope: Auto adjust per query (RECOMMENDED) and Orthology restrictions: Transfer annotations from one-to-one orthology only.

Result

The results from eggNOG can be found here. The result contain 72 scanned queries.

Discussion

eggNOG apply a biological function to the input sequences. The plan was to use the output from maker as input to eggNOG but since the maker pipline didn't work I used the Trinity-GG.fasta file instead. Maker finds physical regions of a genome that encode a genomic feature so this would include all genes and features that are found in the genome. Apart from the transcriptome assembly that only contain all RNA sequences in the genome, the maker output would for example also have included genes that are not expressed(no mRNA). However, since the maker pipeline didn't work out, I used the original annotation from the paper for further analysis.