Overview - ajmoore143/KEGGBLAST GitHub Wiki

Overview

Purpose:
KEGGBLAST automates the common steps required to go from a KEGG Orthology (KO) entry to fully-formatted FASTA files and BLAST results, including:

  • Fetching gene IDs from a given KO (e.g. “K09252”)
  • Downloading amino acid (AASEQ) and nucleotide (NTSEQ) sequences for each gene
  • Automatically matching user-provided species names (even if slightly misspelled) to KEGG IDs
  • Saving results (tables, folder structures, FASTA files)
  • Running BLAST (via either gget or NCBI API), with optional taxonomic filters
  • Caching the KEGG species dictionary locally for faster subsequent runs

Who Should Read This:

  • Bioinformaticians who need to pull sequences in bulk from KEGG
  • Anyone who wants to automate BLAST searches against a list of KO-derived genes
  • Developers looking to integrate KEGG + BLAST steps into a larger pipeline

Dependencies / Prerequisites:

  • Python 3.7+
  • The keggblast package (install via pip install .)