4.4.6 Blast plot - WangLabTHU/GPro GitHub Wiki

hcwang and qxdu edited on Aug 4, 2023, 1 version

Introduction

BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. BLAST-n is specially designed for nucleic acid sequence, we can evaluate the similarity between the generated sequence and the natural to ensure the dissimilarity. However, reports directly obtained by BLAST are often in textual form, which is inconvenient for users. Here, a visualization interface is provided, as barplot and density of e-values(The smaller this criteria, the higher the similarity) will be provided.

Guidance for BLAST-n

If the parameter settings are not appropriate enough, there may be no search results. Here, we will analyze Diffusion_blast.txt, a file containing 100 sequences generated by Diffusion model. You can obtain the original data from blastn_plot_helper/data/ folder (https://drive.google.com/drive/folders/1aQRU69PzXm36bPfIy-0W0HKS8TNlbrja).

Input file should be in fasta format, and we choose E.coli(taxid:83333) as the taxonomy. This is the part that you can modify for your need. But we highly recommend you that choose Somewhat similar sequences, with a high Expect threshold. We also reduce the Max target sequences to reduce computational expense. For your own data, you can also try adjusting other search parameter configurations, even on BLAST program in your local machine.

Finally, we choose Text format in Download All. This will be further analysed in the following program.

Parameters

params description default value
gen_blast path of blastn report of generated sequences
nat_blast path of blastn report of natural sequences
report_path saving folder

Demo

from gpro.evaluator.blast_plot import blastn_evaluation

project_path = "your project path"
nat_blast = os.path.join(project_path, 'data/natural_blast.txt')
gen_blast = os.path.join(project_path, 'data/generated_blast.txt')
blastn_evaluation(gen_blast, nat_blast, report_path="./results/")

You will get blastn_evalue_barplot.png and blastn_evalue_distribution.png under report_path.

blastn_evalue_barplot.png should be as follows:

blastn_evalue_distribution.png should be as follows:

⚠️ **GitHub.com Fallback** ⚠️