06 Using Sub Workflows - NBChub/bgcflow GitHub Wiki

Available sub workflows

Sub workflows are Snakefiles that can be run on top of the main workflow in BGCFlow. All available workflows can be shown using bgcflow run -h. This subworfklows can be executed by running:

bgcflow run --workflow {workflow name or Snakefile}

As of bgcflow_wrapper v0.3.5, these subworkflows are officially included:

  • BGC: Do comparative BGC analytics of selected antiSMASH BGC regions
  • Database: Build a duckdb database. Same as running bgcflow build database
  • Report: Build a Jupyter notebook markdown reports. Same as running bgcflow build report
  • Metabase: Serve a Metabase server. Same as running bgcflow serve --metabase
  • lsagbc: Run a population genetic analysis using lsabgc-easy pipeline
  • ppanggolin: Build a graph based pangenome and identify region of genome plasticity

Additional subworkflows that will be included in bgcflow v0.8.2:

  • Alleleome: Run Core-Alleleome to explore and analyze natural sequence variations within the Open Reading Frames (ORFs) of alleles of core genes in a species' pan-genome, both at the amino acid and nucleotide levels (Archana S. Harke et al., 2023). This can be run by providing the path to the Snakefile:
bgcflow run --workflow workflow/Alleleome

Running a comparative BGC workflows

This feature is used when you have a selection of AntiSMASH BGC regions that you want to compare. You might want to run this after finishing the main workflow

  1. Make a new project folder in config/<project_name> for that particular BGCs. You can see the example config format here: https://github.com/NBChub/bgcflow/tree/dev-0.6.1/.examples/lanthipeptide

  2. The samples csv (https://github.com/NBChub/bgcflow/blob/dev-0.6.1/.examples/lanthipeptide/df_antismash_6.1.1_bgc.csv). This can be edited from the previous results table (tables/df_regions_antismash_6.1.1.csv). You then needs to add this two columns:

  • source (right now just write “bgcflow” as the source)
  • gbk_path (preferably an absolute path to the antismash BGC region genbank file, you can also use your own BGCs)
  1. You can then create a project config file (https://github.com/NBChub/bgcflow/blob/dev-0.6.1/.examples/lanthipeptide/project_config.yaml). The latest available rules can be seen here: https://github.com/NBChub/bgcflow/blob/dev-0.6.1/workflow/rules_bgc.yaml. Here are the current rules available:
  • bigslice:
  • query-bigslice
  • bigscape
  • clinker
  • interproscan
  • mmseqs2
  1. Add the project to the global config file in config/config.yaml under the bgc_projects variable (see https://github.com/NBChub/bgcflow/blob/dev-0.6.1/.examples/_config_example.yaml#L27-L28): bgc_projects:
  - name: config/<project_name>/project_config.yaml

5.You can then run the subworkflow with e.g.:

bgcflow run --snakefile workflow/BGC -c 2 -n