4.2. Remove existing bioinformatics tool from ProkEvo - npavlovikj/ProkEvo GitHub Wiki

In order to demonstrate how to remove an existing program from ProkEvo, we will use SISTR as an example. SISTR is used only for Salmonella genomes, so when non-Salmonella genomes are used, this step should be removed.

In ProkEvo, SISTR is part of the second sub-pipeline, so its dependencies are defined at the end of the file sub-dax.py, in the Section "Add control-flow dependencies":

[centos@npavlovikj-prokevo ProkEvo]$ cat sub-dax.py
...
for i in range(0,length):
    # Add control-flow dependencies
    dax.addDependency(Dependency(parent=plasmidfinder_run[i], child=ls_run))
    dax.addDependency(Dependency(parent=prokka_run[i], child=roary_run))
    # COMMENT OUT THE LINE BELOW TO SKIP SISTR IF NON SALMONELLA ORGANISM IS USED!!!
    dax.addDependency(Dependency(parent=sistr_run[i], child=cat))
dax.addDependency(Dependency(parent=mlst_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_argannot_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_card_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_ncbi_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_plasmidfinder_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_resfinder_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_vfdb_run, child=ls_run))
dax.addDependency(Dependency(parent=roary_run, child=fastbaps_run))
dax.addDependency(Dependency(parent=cat, child=merge_sistr_run))
...

In order to remove SISTR, or any other program, from the pipeline, there are two parts that need to be removed - one part is the task for the tool and the tasks for all the dependencies that come after, and the other one is the line with the dependencies run before or after the tool.

In the setup of ProkEvo, there is no program run before SISTR, but there is a concatenation step "cat" run after SISTR, so the following two lines should be commented out:

...
dax.addDependency(Dependency(parent=sistr_run[i], child=cat))
...
dax.addDependency(Dependency(parent=cat, child=merge_sistr_run))
...

Additionally, the line dax.addJob(sistr_run[i]) should be commented out as well.

With this, the Section "Add control-flow dependencies" from the file sub-dax.py should look like:

[centos@npavlovikj-prokevo ProkEvo]$ cat sub-dax.py
...
for i in range(0,length):
    # Add control-flow dependencies
    dax.addDependency(Dependency(parent=plasmidfinder_run[i], child=ls_run))
    dax.addDependency(Dependency(parent=prokka_run[i], child=roary_run))
    # COMMENT OUT THE LINE BELOW TO SKIP SISTR IF NON SALMONELLA ORGANISM IS USED!!!
    # dax.addDependency(Dependency(parent=sistr_run[i], child=cat))
dax.addDependency(Dependency(parent=mlst_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_argannot_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_card_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_ncbi_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_plasmidfinder_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_resfinder_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_vfdb_run, child=ls_run))
dax.addDependency(Dependency(parent=roary_run, child=fastbaps_run))
# dax.addDependency(Dependency(parent=cat, child=merge_sistr_run))
...

With these changes, the SISTR step is removed, and the researcher can submit the workflow.

Another example of program which may need to be removed is Roary. In ProkEvo, Roary is part of the second sub-pipeline, so its dependencies are defined at the end of the file sub-dax.py, in the Section "Add control-flow dependencies":

[centos@npavlovikj-prokevo ProkEvo]$ cat sub-dax.py
...
for i in range(0,length):
    # Add control-flow dependencies
    dax.addDependency(Dependency(parent=plasmidfinder_run[i], child=ls_run))
    dax.addDependency(Dependency(parent=prokka_run[i], child=roary_run))
    # COMMENT OUT THE LINE BELOW TO SKIP SISTR IF NON SALMONELLA ORGANISM IS USED!!!
    dax.addDependency(Dependency(parent=sistr_run[i], child=cat))
dax.addDependency(Dependency(parent=mlst_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_argannot_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_card_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_ncbi_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_plasmidfinder_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_resfinder_run, child=ls_run))
dax.addDependency(Dependency(parent=abricate_vfdb_run, child=ls_run))
dax.addDependency(Dependency(parent=roary_run, child=fastbaps_run))
dax.addDependency(Dependency(parent=cat, child=merge_sistr_run))
...

Similarly, in order to remove Roary, or any other program, from the pipeline, there are two parts that need to be removed - one part is the task for the tool and the tasks for all the dependencies that come after, and the other one is the line with the dependencies run before or after the tool.

In the setup of ProkEvo, there is no program run before SISTR, but BAPS is run after SISTR, so the following line should be commented out:

...
dax.addDependency(Dependency(parent=roary_run, child=fastbaps_run))
...

Additionally, the lines dax.addJob(roary_run) and dax.addJob(fastbaps_run) should be commented out as well.

With these changes, the Roary and BAPS steps are removed, and the researcher can submit the workflow.

More information about the dependency section and syntax in the Python script can be found in the documentation of Pegasus.