Executing Python code - RomainFeron/workshop-snakemake-sibdays2020 GitHub Wiki
The 'run' directive
So far, all examples of rules we presented were using the shell directive introduced in the Defining rules section to run a shell command. Instead of a shell command, it is possible to run Python code using the run directive. In the code snippet, variables and directives from the rule can be accessed just like in the shell directive.
Let us recall the last example rule implemented in the Parameters section:
rule first_step:
input:
'data/first_step.tsv'
output:
'results/first_step.txt'
params:
lines = 5
shell:
'head -n {params.lines} {input} > {output}'
Although it would not be very useful in this case, we could replace the shell command with Python code:
rule first_step:
input:
'data/first_step.tsv'
output:
'results/first_step.txt'
params:
lines = 5
run:
input_file = open(input[0])
output_file = open(output[0], ‘w’)
for i in range(params.lines):
output_file.write(input_file.readline())
Snakemake has a built-in function to submit a shell command from a run block, which takes a string as input and returns the output of the command:
shell('command {input} {output} {params}')
The run directive can be useful to design wrappers around shell commands or implement small functions for which there is no simple existing software. However, for functions involving complex Python code, the script approach is preferred.
The 'script' directive
The script directive replaces shell or run and it used to run external scripts. These scripts can be in Python, R, R Markdown, or Julia. We will only cover Python within this workshop; for more information about scripts in other languages, refer to the scripts section of the official documentation.
To call a script from the rule, specify the path to the script relative to the rule's snakefile. Within an Python script, all variables and directives from the rule calling the script are accessible via a snakemake Python object. This object also contains the workflow's global variables, such as wildcards and config parameters.
The following example modifies the rule implemented in the previous section (the run directive) to use an external script. Assuming the script first_step.py is located in the same directory as the snakefile, the rule would then become:
rule first_step:
input:
'data/first_step.tsv'
output:
'results/first_step.txt'
params:
lines = 5
script:
'first_step.py'
The script first_step.py would contain the following code:
# Retrieve information from Snakemake
input_file = open(snakemake.input[0])
output_file = open(snakemake.output[0], 'w')
n_lines = snakemake.params.lines
# Process file
for i in range(n_lines):
output_file.write(input_file.readline())
Tip: in many cases, it would be nice to have a script that can be called by Snakemake but also work with standard Python, so that the code can be reused in other projects. There are several ways to do that:
- You could implement most of the functionalities in a module and use this module in a simple script called by Snakemake.
- You could test for the existence of a
snakemakeobject and handle parameter values differently (e.g. command-line arguments) if the object does not exist.