Generating test suites with Snakemake - thekswenson/Zombi_wiki GitHub Wiki
Overview
Here we present a way to generate trees over a varying set of parameters. Our Snakemake library provides the functionality to generate several projects, each one having simulated data for a single combination of parameters. There are two ways to use our library:
-
Use the
zombiSnakemakecommand to generate a suite of test datasets in your project directory. -
Import our library in your Snakefile.
Using zombiSnakemake
Simply call zombiSnakemake my_simulation_dir and follow the simple
instructions printed to the terminal: a directory called my_simulation_dir
will be created along with a my_simulation_dir/Snakefile and the default
config and parameter files. To test your setup simply run snakemake -c 1
from inside the my_simulation_dir directory.
Accessing our library from a Snakefile
- Install Zombi in your mamba environment.
- Import the library adding these lines to your Snakefile:
from zombi.snakemake.parameters import ZOMBI_EXPORT_SNAKEFILE include: ZOMBI_EXPORT_SNAKEFILE - Use helper lists like
ZOMBIPARAMDIRSandZOMBIPARAMDIRS_NOREPSto write rules that depend on the Zombi output files.
Directory structure of simulated data
Each set of simulated parameter values yields a new Zombi project directory, containing all of the files resulting from a normal Zombi run. The path to this directory has the following structure (by default):
simulations/sequences/treeparams-{TMODE}-rep{X}/{TREE_PARAMS}/genomeparams-{GMODE}-rep{Y}/{GENOME_PARAMS}/sequenceparams-{SMODE}-rep{Z}/{SEQUENCE_PARAMS}/
where X, Y, and Z are replicate numbers, and each of TMODE, GMODE, and SMODE are the
modes under which each command was run (e.g. GMODE is one of {G, Gu, Gf, Gm}).
Each of TREE_PARAMS, GENOME_PARAMS, and SEQUENCE_PARAMS are paths
containing a directory for each non-default simulation parameter. The
parameter will appear as the name in the config file along with its value,
separated by a - dash. Therefore, a test suite where none of the default
parameters have been changed would produce a project directory
simulations/sequences/treeparams-T-rep0/genomeparams-G-rep0/sequenceparams-S-rep0/.
If, for example, the parameter TOTAL_LINEAGES was set to (non-default) 5, and TANDEMDUP
was set to (non-default) f:0, then we would see
simulations/sequences/treeparams-T-rep0/TOTAL_LINEAGES-5/genomeparams-G-rep0/TANDEMDUP-f:0/sequenceparams-S-rep0/.
[!TIP] The order in which the directory names appear in these paths is the same as in the default parameter files of the Zombi installation, and so are predictable, stable, and follow the general snakemake paradigm (i.e. parameters are stored in path names).