Cookiecutter - koppsteinlab/knowledge-repo GitHub Wiki
Cookiecutter
What is cookiecutter?
Cookiecutter is a command-line tool that helps you to quickly create projects from predefined templates. Itβs perfect for setting up Python packages and other types of projects with a consistent folder structure.
Our cookiecutter template is a forked version of the cookiecutter-bioinformatics-project and has a similar structure of Snakemake workflows:
βββ CITATION.cff <- Contains metadata on how the project might eventually be published.
βββ LICENSE
βββ Makefile <- Makefile with commands like `make data` or `make train`
βββ README.md <- The top-level README for developers using this project.
βββ config <- Configuration options for the analysis.
| βββ config.yaml <- Snakemake config file.
| βββ samples.tsv <- A metadata table for all the samples run in the analysis.
β
βββ docs <- A default Sphinx project; see sphinx-doc.org for details
β
βββ environment.yaml <- The requirements file for reproducing the analysis environment, e.g.
β generated with `conda env export > environment.yaml`
β
βββ img <- A place to store images associated with the project/pipeline, e.g. a
β a figure of the pipeline DAG.
β
βββ notebooks <- Jupyter or Rmd notebooks. Naming convention is a number (for ordering),
β the creator's initials, and a short `-` delimited description, e.g.
β `1.0-jqp-initial-data-exploration`.
β
βββ references <- Data dictionaries, manuals, and all other explanatory materials.
β
βββ reports <- Generated analysis as HTML, PDF, LaTeX, etc.
β βββ figures <- Generated graphics and figures to be used in reporting
β
βββ resources <- Place for data. By default excluded from the git repository.
β βββ external <- Data from third party sources.
β βββ raw_data <- The original, immutable data dump.
β
βββ results <- Final output of the data processing pipeline. By default excluded from the git repository.
β
βββ sandbox <- A place to test scripts and ideas. By default excluded from the git repository.
β
βββ scripts <- A place for short shell or python scripts.
β
βββ setup.py <- Makes project pip installable (pip install -e .) so src can be imported
β
βββ src <- Source code for use in this project.
β βββ __init__.py <- Makes src a Python module
βββ tox.ini <- tox file with settings for running tox; see tox.readthedocs.io
β
βββ workflow <- Place to store the main pipeline for rerunning all the analysis.
β βββ envs <- Contains different conda environments in .yaml format for running the pipeline.
β βββ rules <- Contains .smk files that are included by the main Snakefile, including common.smk for functions.
β βββ scripts <- Contains different R or python scripts used by the script: directive in Snakemake.
β βββ Snakefile <- Contains the main entrypoint to the pipeline.
β
βββ workspace <- Space for intermediate results in the pipeline. By default excluded from the git repository.
Your main code with the different rules will be stored in a GNU Makefile, so that someone else can just execute later i.e. make test
to run the whole pipeline.
Why should I use cookiecutter?
The goal is to have a standardized folder structure to ensure consistency and reproducibility across your different research projects.
Setting up the cookiecutter template on HILBERT
Here is a brief tutorial for setting up the cookiecutter template on HILBERT.
HILBERT is not directly connected to the internet, so there might be small differences in setting it up there compared to the DKFZ Cluster.
Make sure you have a package manager, i.e. conda
installed beforehand and set the right channels (conda-forge
, bioconda
) and channel priorities in your .condarc
file before following this tutorial. You can find a brief description on how to do this here.
Here is a step-by-step guide:
- Setup the cookiecutter environment.
conda create --name cookiecutter_env cookiecutter
- Activate your conda environment.
conda activate cookiecutter_env
-
The cookiecutter template from the Koppstein Lab is located under the following path
/gpfs/project/projects/KoppstBioCore/cookiecutter_template
. Don't touch this template folder! Go inside your analysis folder inKoppstBioCore
with i.e.cd analyses_username
. -
With the
cookiecutter_template
folder, you can generate now a predefined folder structure in your own project folder, i.e. withcookiecutter ../cookiecutter_template
. Just provide as an argument to the activated cookiecutter environment, the path to the cookiecutter template with it's corresponding metadata and JSON file. The path here can be either absolute or relative (doesn't really matter). Just call the command above in your corresponding analysis folder, where you would like to set up the cookiecutter template. -
Fill out the required entries i.e. with default values.
-
Happy coding! :smiley:
Setting up the cookiecutter template on the DKFZ Cluster
- Make sure you have the cookiecutter environment installed. It is assumed that you've already setup
conda
and it's channels as described above.
conda create --name cookiecutter_env cookiecutter
- Activate your conda environment.
conda activate cookiecutter_env
-
Go into your analysis folder.
-
Execute inside the activated conda environment the following command:
cookiecutter gh:koppsteinlab/cookiecutter-bioinformatics-project
-
To create your cookiecutter template, fill out the required entries i.e. with default values.
-
Happy coding! :smiley: