06 Using Sub Workflows - NBChub/bgcflow GitHub Wiki
Available sub workflows
Sub workflows are Snakefiles that can be run on top of the main workflow in BGCFlow. All available workflows can be shown using bgcflow run -h
. This subworfklows can be executed by running:
bgcflow run --workflow {workflow name or Snakefile}
As of bgcflow_wrapper
v0.3.5
, these subworkflows are officially included:
- BGC: Do comparative BGC analytics of selected antiSMASH BGC regions
- Database: Build a
duckdb
database. Same as runningbgcflow build database
- Report: Build a
Jupyter
notebook markdown reports. Same as runningbgcflow build report
- Metabase: Serve a Metabase server. Same as running
bgcflow serve --metabase
- lsagbc: Run a population genetic analysis using
lsabgc-easy
pipeline - ppanggolin: Build a graph based pangenome and identify region of genome plasticity
Additional subworkflows that will be included in bgcflow
v0.8.2
:
- Alleleome: Run Core-Alleleome to explore and analyze natural sequence variations within the Open Reading Frames (ORFs) of alleles of core genes in a species' pan-genome, both at the amino acid and nucleotide levels (Archana S. Harke et al., 2023). This can be run by providing the path to the Snakefile:
bgcflow run --workflow workflow/Alleleome
Running a comparative BGC workflows
This feature is used when you have a selection of AntiSMASH BGC regions that you want to compare. You might want to run this after finishing the main workflow
-
Make a new project folder in config/<project_name> for that particular BGCs. You can see the example config format here: https://github.com/NBChub/bgcflow/tree/dev-0.6.1/.examples/lanthipeptide
-
The samples csv (https://github.com/NBChub/bgcflow/blob/dev-0.6.1/.examples/lanthipeptide/df_antismash_6.1.1_bgc.csv). This can be edited from the previous results table (tables/df_regions_antismash_6.1.1.csv). You then needs to add this two columns:
- source (right now just write “bgcflow” as the source)
- gbk_path (preferably an absolute path to the antismash BGC region genbank file, you can also use your own BGCs)
- You can then create a project config file (https://github.com/NBChub/bgcflow/blob/dev-0.6.1/.examples/lanthipeptide/project_config.yaml). The latest available rules can be seen here: https://github.com/NBChub/bgcflow/blob/dev-0.6.1/workflow/rules_bgc.yaml. Here are the current rules available:
- bigslice:
- query-bigslice
- bigscape
- clinker
- interproscan
- mmseqs2
- Add the project to the global config file in config/config.yaml under the bgc_projects variable (see https://github.com/NBChub/bgcflow/blob/dev-0.6.1/.examples/_config_example.yaml#L27-L28): bgc_projects:
- name: config/<project_name>/project_config.yaml
5.You can then run the subworkflow with e.g.:
bgcflow run --snakefile workflow/BGC -c 2 -n