14. DAY 15.1.2017 - mai0/Project_BB2491 GitHub Wiki

Thoughts of the day 👍

''No human being will ever know the Truth, for even if they happen to say it by chance, they would not even know they had done so, Xenophanes''

  • We realize that Abyss outperformed SOAPdenovo. I tried to find documentation in which probably some people had the same results, but I didn't reach any conclusion. We observed that regarding the running times though SOAP outperformed Abyss since its total running time was around 20 minutes. My suggestion is that since the documentation of SOAP gave many parameters, maybe there was a different parameter more suitable for our data and more favorable results. However, we didn't have enough time to experiment a lot on this since we got our results a bit later that expected and Christmas was challenging to focus and communicate. So, one possible improvement is to experiment on different parameters

  • Also, Abyss was creating a scaffold in the end of the assembly, whereas SOAP was just resulted on the contigs. The comparison we 've done to select the best assembler was based on N50, but for scaffolds and contigs respectively. I suggest that in order to have a more even comparison between these 2 assemblers we should find a more universal, but still concrete statistical method.

  • Regarding the quality of scaffold coming from Abyss, we have to optimize the k-mer value.

  • Another aspect is the improvement of the scaffolds, which could be done by using a GC-selection. Generally the cp GC content is around 40%, so by using a GC threshold, we can select only those scaffolds that can succeed this.

  • The annotation of the assembly based on the reference spruce genome will provoke biases towards the reference genome. Therefore, there will be many similarities between them. One solution for this problem is to compare our assembly with a different species, like the white spruce, and compare the results to each other. Also, we could use longer reads and mate pair data.

  • The reason we assemble chloroplasts of the Norwegian spruce is that chloroplasts are a very important part of the plants since they play vital role in photosynthesis and contain conserved information important for the evolution. These facts can be rendered very useful so as to understand better the evolution, to improve our co-existence with the nature and to proceed to more directed plant modifications. Ultimately, this can increase the wood production and make conifer more resistant to diseases or more modular to climate change.

  • Conifers are a group of gymnosperms that dominated a large parts of the forests in the world. They present a very high ecological and economical importance, but their genome size is enormous and complex (repetitive) and as a result no complete genome sequencing has been succeeded so far. If we manage to sequence the spruce genome, we will come up with information useful for molecular breeding of conifers, methods for future planting as a result to climate changes and the enhanced conifers can be rendered as feedstock for biofuels. Moreover, if we manage to sequence the spruce genome, it will mean that our technologies will have been improved in that level that human genome can be sequenced with better quality as well. Therefore, this can help to more quick and accurate prediction and cure of diseases.