Applied Modeling Project Structure - EpiModel/EpiModeling GitHub Wiki

This page describes the structure of an EpiModel Applied Modeling Project. Specifically, the content of the project repo.

We assume that a researchProj repo has been created from EpiModelHIV-Template. See Getting Started with EpiModelHIV.

researchProj Root

The root of the project is the top-level directory. Inside, the files are organized into various sub-directories.

researchProj/
├── R/
├── data
├── workflows/
├── README.md
├── renv.lock
└── researchProj.Rproj

Mimicking the structure of R packages, the R/ directory will contain all the R scripts used by the project.
The data directory contains files used by the R scripts as well as files created by running the scripts.
The workflows directory will contain workflow directories used to run code on High Performance Computing systems (HPC) using the slurmworkflow package.
The README.md file should describe the purpose of the code and link to the published article once the project is finished.
The renv.lock file contains the list of packages used by the project with their respective versions.
The researchProj.Rproj is the RStudio project file.

R Scripts in the R/ Directory

Modeling projects are complex and involve a lot of scripts. The scripts follow naming conventions that make them easier to navigate.

R/
├── 00-setup_packages.R
├── 01-networks_estimation.R
├── 02-networks_diagnostics.R
├── ...
├── 10-calibration_sim.R
├── 11-calibration_process.R
├── 12-calibration_eval.R
├── ...
├── utils-0_project_settings.R
├── utils-targets.R
├── ...
├── workflow_01-networks_estimation.R
├── workflow_02-model_calibration.R
├── ...
└── z-test.R

First, the numbered scripts starting with a 2 digits number (e.g. 01-networks_estimation.R). These are the various steps of the project. They are meant to be run in order to produce the full analysis. To organise the project better, these scripts actually define several big parts of the projects.

0x-scripts.R: estimation of the network objects
1x-scripts.R: calibration of the epidemic model
...

Then the utility scripts starting with utils- contains code that is used by several numbered scripts. They should never be run on their own. They are sourced by the numbered scripts requiring them. This helps us follow the DRY principle (Do not Repeat Yourself).

The workflow scripts, starting with workflow_XX- will create the workflow directories used to run heavy computational jobs on HPC.

Finally, the z-test.R scripts is there as a draft script to test code semi interactively.

Data in the data/ directory

The data/ directory contains data either required or used by the scripts. It is organised as follow:

data/
├── input/
│   ├── params.csv
│   └── scenarios.csv
├── intermediate/
└── output/

input/ will contain everything that is required by the project prior to running any R code. Usually, the parameters for the models and the list of intervention scenarios to be run as part of the analysis. This directory is checked up by git (thus saved on GitHub).
intermediate/ is for everything that is created by running the scripts, reused by other scripts, but not necessary for the final paper. This includes: networks estimations, calibration artifacts, raw data from the intervention runs. This folder usually fills up quickly and is NOT checked up by git.
output/ here are stored the final results of the analysis. This directory is checked out with git. It should contains data relevant for anyone wanting to reproduce your analysis and compare their results with yours.