05 Synteny comparison - saltpinna/Genome_analysis_project GitHub Wiki
Synteny comparison was done using BLASTN. The script used for this can be found under code/scripts/Blasnt_synteny_comparison.sh. The result from the synteny comparison was plotted in ACT which resulted in the plot below:
Red lines correspond to the same sequence between the assemlby and the reference genome, and blue lines correspond to inversions. We can see big red blocks in the middle of the assembly which indicates that our assembly matches the reference genome quite well in those regions. There are still some lines that do not align perfectly and som inversions in these seqeunces as well, but the blocks are definitely clear. In the beggining and end of the assembly we can see large large blur blocks that cross over each other, which indicates inverted seqeunces. So, we have assembled the correct seqeunce, only inverted. This pattern of consensus between the asssembly and the reference in the middle of the genome and inversion towards the ends is the same pattern that was found in the Mummerplot in the assembly evaluation.
Questions
How relevant is the output format that you choose?
The output format chosen when running Blastn on the command line is important since we want to be able to interpret the results of the blast search. For this, it was really helpful to plot the synteny comparison in ACT, using the output from the blastn search as well as our assembled genome and reference genome sequence.
How do the resulting hits vary when you change the minimum e-value?
The e-value is the number of expected hits with similar quality score that could be found just by chance. So a lower e-value means a better match. So, when changing the minimum e-value to a higher value would allow for more hits (also with lower quality).
How is the alignment score calculated?
The alignment score is calculated based on a substituttion matrix in Blast which provides scores for substitutions and gaps depending on how likely they are to happen by chance. An exact nucleotide match will give a high score and an unlikely substitution will give a negative score, for example. When summing up all these scores for a sequence, the alignment score is the result.
How important is the number of threads when you blast against a database, or against a particular sequence?
The number of threads is important when blasting against an entire database, since the search can then be done in multiple threads in parallel which speeds up the process a lot. When just blasting a sequence against another seqeunce though, the number of threads doesn't matter since only one alignment is performed anyways.