Synteny comparison - MaryamDost/GenomeAnalysis GitHub Wiki

Synteny comparison

Synteny comparison is a way to visualize synteny block within two sets of chromosomes between genomes that share a common order of homologous genes derived from a common ancestor that are being compared with each other.

In this step synteny was compared to fined homologous and conserved genes in L. ferriphilum. The first genome that was compared belongs to L. ferriphilum strain, ML04, of which a completely assembled genome is available. The second genome belong to L. ferrooxidans , which is another species of the same genus. Finally, Thermodesulfovibrio yellowstonii was chosen to have a good visualization in differences in amount conserved genes between our L. ferriphilum and these other genomes.

The pairwise alignment files were created using blast in UPPMAX terminal after loading the required modules.

# Load modules

module load bioinfo-tools

module load blast

blastn -query file1.fa -subject file2.fa

Artemis Comparison Tool (locally) and Circoletto (online) were used as vitalization tools

Results

Figure ?

Figure 1: ACT synteny plot. The L. ferriphilum is on top and L.ferriphilum ML04 is on the bottom

Figure 2: Plot generated with Circoletto. The L. ferriphilum white and L.ferriphilum ML04 in grey

Figure 3: ACT synteny plot. The L. ferriphilum is on top and L.ferrooxidans C2-3 is on the bottom

Figure 4: Plot generated with Circoletto. The L. ferriphilum white and L.ferrooxidans C2-3 in grey

Figure 5: ACT synteny plot. The L. ferriphilum is on top and _Thermodesulfovibrio yellowstonii _is on the bottom

Figure ?

Figure 6: Plot generated with Circoletto. The L. ferriphilum white and Thermodesulfovibrio yellowstonii in grey

Discussion

Our results look good as we have analyzed regions of similarity and the differences between genomes.

In ACT plots red bands indicate BLAST matches between sequences in the same orientation and blue twisted bands indicate BLAST matches between sequences in opposite orientations. The reason why we see much of the blue band due to sequencing complimentary DNA stand (discussion on Assembly Evaluation).

A well conserved gene indicated genes that are beneficial for species adaptation and survival. To analyze conserved genes, we look at homolog genes between L. ferriphilum and 3 other bacteria strains. As we can see in figure 1 and 2 the similarity L. ferriphilum and _L. ferriphilum ML04_is high, which is expected as both belong to the same strain.

The homology with L. ferrooxidans is however extremely lower than expected, se figure 3 and 4. We have a lot of shuffling and do not have very much sequencing that mapped. This could be due to them adjusting to different environment and thereby obtaining specific beneficial threads. A good comparison between them is found here.

Synteny analysis between L. ferriphilum and T. yellowstonii shows a very little match, se figure 5 and 6. This is not surprising as they are phylogenetically further apart and have different niches, leading to higher divergence between genomes.

Lab-manual questions

  • How relevant is the output format that you choose?

It is very relevant, as all visualization software require a specific format for its input file. The ACT software used in this project requires tab separated values.

  • How do the resulting hits vary when you change the minimum e-value?

The Expect value (E) is a parameter that describes the number of hits one can "expect" to see by chance not homology when searching a database of a particular size. For a large e-value the biological significance of the synteny observed in the plot is less. The change in the minimum e-value (or the maximum e-value) removes/adds the hits that happens by chance. A low minimum e-value is good but if it is too strict it might also remove some biologically relevant hits.

How is the alignment score calculated?

While aligning with BLAST when two letters match a positive score is produce and on contrary when two letters do not match a negative score is produced. These scores are summed up over the length of the alignment. For nucleotide alignments, BLAST uses a reward of +2 for aligned pairs of identical letters and a penalty of −3 for each nonidentical.

How important is the number of threads when you blast against a database, or against a particular sequence?

It is important when sequences are extremely long. When allowing a higher number of threads, BLAST will run jobs in parallel, so it will process the job faster and thus reduces the running time of the jobs. However, for shorter sequence the number of treads will not have any significant effect.

Reference

Fassler Jan,Cooper Peter(2011)BLAST Glossary:https://www.ncbi.nlm.nih.gov/books/NBK62051/

T. Tzvetkova, S. Selenska-Pobell & V. Groudeva (2002) Recovery and Characterization of LeptospirillumFerrooxidans/LeptospirillumFerriphilum and Acidithiobacillus Ferrooxidans Natural Isolates from Uranium Mining Waste Piles, Biotechnology & Biotechnological Equipment, 16:1, 111-117, DOI: 10.1080/13102818.2002.10819164: https://doi.org/10.1080/13102818.2002.10819164

Wheeler D, Bhagwat M. BLAST QuickStart: Example-Driven Web-Based BLAST Tutorial. In: Bergman NH, editor. Comparative Genomics: Volumes 1 and 2. Totowa (NJ): Humana Press; 2007. Chapter 9. Available from: https://www.ncbi.nlm.nih.gov/books/NBK1734/