Result - Siqi-Li-0112/Genome-Analysis GitHub Wiki

Result

Genome assembly result

The assembly result was evaluated by Quast and MUMmerplot. Quast showed a N50 value of 574089, which is not very high but good enough(Nx Plot). Quast also showed that the GC content among the scaffold are mostly lower than 50% (GC content plot), which indicated that the gen density might be low on this scaffold. The MUMmerplot result looks pretty good. Since there were lots of contigs in the assembled genome and I only have a reference sequence from NCBI, it's reasonable to choose a coverage or percent identity plot. In this plot the x axle repersent the location of reference sequence, y axle repersent the similarity between each contigs and the reference. The result shows that many of the contigs have a similarity higher than 80%. MUMmerplot

I also used a dot plot. For this plot I used the "-l" option in MUMmerplot to do a multiplot by ordering and orienting sequences such that the largest hits cluster near the main diagonal.

Differential expression analysis

I first compaired the expression of genes in arils from different clades (Musang King and Monthong). Used the defult setting to have a first look at the reult. The blue dots are genes that have different expression level among clades. Then I filter the result to select those significantly different genes. I marked those genes that have FDR<0.1 and abs value of log2FoldChange not smaller than 1.0 as significant results, and all others are nonsignificant. This would mark genes whose expression differed by more than two times.

97 of genes are found expressed significantly different among arils from two clades. For all the significant result please check here

Similar analysis was performed on different parts from Musang King. I compaired leaf and aril, root and aril, stem and aril.
Following is the first galnce of leaf VS aril and the filtered result (p<0.1 and abs of log2FoldChange >1) First glanceFiltered result

Following is the first galnce of root VS aril and the filtered result (p<0.1 and abs of log2FoldChange >1)

Following is the first galnce of stem VS aril and the filtered result (p<0.1 and abs of log2FoldChange >1)

Then I collect all the significant results from each comparison and got 74 genes. These genes all express significantly different in aril than other parts, which means they may related to some aril-specific procedures. For significant genes with annotation from eggNOG please check here

Conclusion

The whole project used PacBio reads to assemble Durio zibethinus cultivar Musang King isolate D1 scaffold_10. The assemble was first made by Canu and then corrected by Pilon with Illumina sequencing reads. The final assemble was assessed by Quast and MUMmerplot, both of them showed a good assemble quality. Then RNA-seq data was aligned to the assemble to annotate its structure (by Braker), the structure annotation result is then used to annotate the function of each structures (by eggNOG mapper). Then RNA-seq datas were used to analysis the expression level of gens from each samples. After performing differential expression analysis, 97 genes were found expressed differently between clade Musang King and Monthong. Then samples from differents parts of Musang King are compaired (leaf and aril, root and aril, stem and aril), and 74 genes are found expressed differently in aril compaired with other parts.