Important notes on blast functions - ampinzonv/BB3 GitHub Wiki

BLAST Compatibility Notice for BioBASH

BioBASH makes use of a customized tabular format (outfmt 6) when working with BLAST results. Although based on the standard -outfmt 6, it extends it with additional fields to allow coverage calculations and reciprocal best hit detection.

This is the default behaviour in BioBASH so if your resulta were created using BioBASH you are safe.


🔍 Required Output Format

BioBASH expects BLAST results to include 14 columns, as shown below:

qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen
  • The last two columns qlen (query length) and slen (subject length) are not part of the default outfmt 6 output.
  • These are required by functions such as bb_blast_best_hit, bb_blast_summary, and bb_plot_blast_hits_txt.

⚠️ Incompatibility Warning

If you use standard BLAST parameters or load BLAST outputs from external sources (e.g. Galaxy, web BLAST), they may lack the required fields. This will result in errors or invalid statistics in BioBASH.


✅ How to Generate a Compatible File

To generate a compatible file for use in BioBASH, run BLAST using the following -outfmt:

blastp -query query.fasta -db db/mydb -out output.tsv \
  -outfmt '6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen'

Make sure the database is formatted correctly using makeblastdb and that your input sequences are in FASTA format.


📦 Naming BLAST Databases in BioBASH

When using any BLAST-related function in BioBASH (e.g., bb_run_blast, bb_blast_on_the_fly, bb_blast_reciprocal), you must reference the BLAST database by its base name onlynot by specific index file extensions.

Example:

If your BLAST database is located at:

db/mibasededatos.nsq
db/mibasededatos.nin
db/mibasededatos.nhr

You must use:

--db db/mibasededatos

Do not include .nin, .nsq, or any extension. BioBASH and the BLAST+ suite will automatically detect the correct database files.


📌 Summary

  • BioBASH requires a 14-column outfmt 6 with qlen and slen.
  • Use specific -outfmt settings in BLAST to ensure compatibility.
  • Always refer to BLAST databases using their base name only.