Overview - ampinzonv/BB3 GitHub Wiki

BioBASH v0.3 "Jazzy"

BioBASH is a lightweight and portable collection of command-line utilities written in pure Bash, designed to support common bioinformatics tasks directly from the terminal. It prioritizes simplicity, modularity, and compatibility with both Linux and macOS systems.


๐Ÿ”ง Module Overview

๐Ÿ“ file.sh

Functions for working with FASTA and FASTQ files:

  • Extract headers, IDs, sequences
  • Subset entries by ID or coordinate range
  • Convert FASTQ to FASTA
  • Compute basic statistics (e.g. N50)

๐Ÿ“œ utility.sh

General-purpose helper functions:

  • Count or list unique items
  • Parse arguments
  • Validate inputs and directories
  • Used internally across modules

๐Ÿ’ฅ blast.sh

Interfaces to NCBI BLAST+ tools:

  • Run BLAST searches (bb_run_blast)
  • Create BLAST databases
  • Parse and filter BLAST outputs
  • Detect reciprocal best hits (RBH)

๐Ÿ“ˆ plot_ascii.sh

Minimalist ASCII plotting tools:

  • Visualize BLAST hit coverage
  • Plot histograms of FASTQ quality scores
  • All plots are text-only, optimized for terminals, SSH sessions, or log file inclusion

๐Ÿ”ค Input and Output Philosophy

BioBASH functions are designed to be pipe-friendly and follow UNIX conventions:

Input Type Mechanism
File --input file.txt
STDIN --input -
Paired inputs Specific flags (e.g. --a, --b)
Output Type Behavior
STDOUT Default unless --outfile is used
File output Use --outfile file.txt or --outdir for multiple files

Most functions allow redirection and integration into pipelines, for example:

cat sequences.fasta | bb_get_fasta_id --input - | bb_get_list --input -

โš™๏ธ Default Behaviors

Parameter Default Notes
--quiet Off Enables verbose output with [INFO] messages
--force Off Prevents overwriting files unless explicitly set
--processors 1 Parallelization for BLAST (where supported)
--sample_size 10 For functions using random subsampling
--phred_offset 33 Used for FASTQ quality interpretation

๐Ÿ“š Philosophy

BioBASH embraces the principle that everything should be transparent and reproducible. All tools output plain text, which is ideal for:

  • Running in remote servers via SSH
  • Logging in pipelines
  • Teaching environments where simplicity is key

Plots are ASCII by design: no dependencies on Python, R, or external librariesโ€”just Bash.


๐Ÿงช Compatibility

BioBASH is tested on:

  • Ubuntu 20.04+
  • macOS 12+ (Monterey or later)

Some modules may require BLAST+ (makeblastdb, blastn, etc.) or gzip utilities.


๐Ÿงฉ Extending BioBASH

Each function is self-contained and can be:

  • Loaded in a shell session
  • Sourced in other scripts
  • Integrated into Makefiles, Snakemake, or Nextflow

๐Ÿ”— License & Contributions

BioBASH is open-source and community-driven. Contributions, bug reports, and feature requests are welcome via GitHub.