02 Running and Monitoring Jobs - NBChub/bgcflow GitHub Wiki
Executing the main workflow
BGCFlow can be executed as a normal snakemake workflow, or using the run
command from the bgcflow_wrapper
.
$ bgcflow run --help
Usage: bgcflow run [OPTIONS]
A snakemake CLI wrapper to run BGCFlow. Automatically run panoptes.
Options:
--bgcflow_dir TEXT Location of BGCFlow directory. (DEFAULT: Current
working directory.)
--workflow TEXT Select which snakefile to run. Available subworkflows:
{BGC|Database|Report|Metabase}. (DEFAULT:
workflow/Snakefile)
--wms-monitor TEXT Panoptes address. (DEFAULT: http://127.0.0.1:5000)
-c, --cores INTEGER Use at most N CPU cores/jobs in parallel. (DEFAULT: 8)
-n, --dryrun Test run.
--unlock Remove a lock on the snakemake working directory.
--until TEXT Runs the pipeline until it reaches the specified rules
or files.
-t, --touch Touch output files (mark them up to date without really
changing them).
-h, --help Show this message and exit.
Running bgcflow run
will execute the main Snakemake workflow from the Snakefile located in workflow/Snakefile
. It is common to execute a dry-run before submitting the real jobs to make sure the configurations does not have errors.
Running the workflow will return this:
$ bgcflow run -n
DEBUG 15/08 15:32:36 Starting new HTTP connection (1): 127.0.0.1:5000
Running Panoptes to monitor BGCFlow jobs at http://127.0.0.1:5000
Panoptes job id: 20679
Connecting to Panoptes...
DEBUG 15/08 15:32:36 Starting new HTTP connection (1): 127.0.0.1:5000
Retrying to connect: 1x
* Serving Flask app 'panoptes.app'
* Debug mode: off
DEBUG 15/08 15:32:37 Starting new HTTP connection (1): 127.0.0.1:5000
DEBUG 15/08 15:32:37 http://127.0.0.1:5000 "GET /api/service-info HTTP/1.1" 200 21
Panoptes status: running
cd . && snakemake --snakefile workflow/Snakefile --use-conda --keep-going --rerun-incomplete --rerun-triggers mtime -c 8 --dryrun --wms-monitor http://127.0.0.1:5000
This is BGCflow version 0.7.1.
Checking dependencies...
Found configuration setting to use antiSMASH 7
antismash from: workflow/envs/antismash.yaml
- antismash will be installed from git+https://github.com/antismash/antismash.git
- antismash==7.0.0
bigslice from: workflow/envs/bigslice.yaml
- bigslice will be installed from git+https://github.com/medema-group/bigslice.git
- bigslice==103d8f2
cblaster from: workflow/envs/cblaster.yaml
- cblaster will be installed using pip
- cblaster==1.3.12
prokka from: workflow/envs/prokka.yaml
- prokka==1.14.6
eggnog-mapper from: workflow/envs/eggnog.yaml
- eggnog-mapper==2.1.6
roary from: workflow/envs/roary.yaml
- roary==3.13.0
seqfu from: workflow/envs/seqfu.yaml
- seqfu==1.15.3
checkm from: workflow/envs/checkm.yaml
- checkm==1.1.3
gtdbtk from: workflow/envs/gtdbtk.yaml
- gtdbtk==2.3.0
Step 1. Extracting project information from config...
Step 2.1 Getting sample information from: config/lactobacillus_delbruecki/project_config.yaml
- Processing project [Lactobacillus_delbrueckii]
- Custom input directory: False
- Getting input files from: /data/a/matinnu/bgcflow/data/raw/fasta
- Custom input format: False
- Default input file type: fna
Step 3 Merging genome_ids across projects...
Step 4. Checking for user-defined local resources...
All resources set.
Step 5. Preparing list of final outputs...
- Getting outputs for project: Lactobacillus_delbrueckii
- WARNING: ignoring errors in rule_dictionary
- Ready to generate all outputs.
GTDB API | Grabbing metadata using GTDB release version: r214
Building DAG of jobs...
...
See the Snakemake documentation for further details of the Snakemake CLI.
Monitoring the workflow
By default, each time a workflow is run, BGCFlow will start monitoring jobs using Panoptes, which can be accessed in http://localhost:5000/.
Once the job is finished, the monitoring server will also be closed. To avoid this, we can serve the monitoring workflow independently by using:
bgcflow serve --panoptes
The command bgcflow serve
is a utility tool serving various servers that we will explore in the next section.
$ bgcflow serve --help
Usage: bgcflow serve [OPTIONS]
Serve static HTML report or other utilities (Metabase, etc.).
Options:
--port_markdown INTEGER Port to use. (DEFAULT: 8001)
--port_panoptes INTEGER Port to use. (DEFAULT: 8001)
--file_server TEXT Port to use for fileserver. (DEFAULT:
http://localhost:8002)
--bgcflow_dir TEXT Location of BGCFlow directory. (DEFAULT: Current
working directory)
--metabase Run Metabase server at http://localhost:3000.
Requires Java to be installed. See:
https://www.metabase.com/docs/latest/installation-
and-operation/java-versions
--panoptes Run Panoptes server to monitor workflow at
http://localhost:5000
--project TEXT Name of the project. (DEFAULT: all)
-h, --help Show this message and exit.