Applied Modeling Project Structure - EpiModel/EpiModeling GitHub Wiki
This page describes the structure of an EpiModel Applied Modeling Project.
Specifically, the content of the project repo.
We assume that a researchProj repo has been created from EpiModelHIV-Template. See Getting Started with EpiModelHIV.
researchProj Root
The root of the project is the top-level directory. Inside, the files are organized into various sub-directories.
researchProj/
├── R/
├── data
├── workflows/
├── README.md
├── renv.lock
└── researchProj.Rproj
- Mimicking the structure of R packages, the R/ directory will contain all the R scripts used by the project.
- The data directory contains files used by the R scripts as well as files created by running the scripts.
- The workflows directory will contain workflow directories used to run
code on High Performance Computing systems (HPC) using the
slurmworkflowpackage. - The README.md file should describe the purpose of the code and link to the published article once the project is finished.
- The renv.lock file contains the list of packages used by the project with their respective versions.
- The researchProj.Rproj is the RStudio project file.
R Scripts in the R/ Directory
Modeling projects are complex and involve a lot of scripts. The scripts follow naming conventions that make them easier to navigate.
R/
├── 00-setup_packages.R
├── 01-networks_estimation.R
├── 02-networks_diagnostics.R
├── ...
├── 10-calibration_sim.R
├── 11-calibration_process.R
├── 12-calibration_eval.R
├── ...
├── utils-0_project_settings.R
├── utils-targets.R
├── ...
├── workflow_01-networks_estimation.R
├── workflow_02-model_calibration.R
├── ...
└── z-test.R
First, the numbered scripts starting with a 2 digits number (e.g. 01-networks_estimation.R). These are the various steps of the project. They are meant to be run in order to produce the full analysis. To organise the project better, these scripts actually define several big parts of the projects.
- 0x-scripts.R: estimation of the network objects
- 1x-scripts.R: calibration of the epidemic model
- ...
Then the utility scripts starting with utils- contains code that is used by
several numbered scripts. They should never be run on their own. They are
sourced by the numbered scripts requiring them. This helps us follow the
DRY principle (Do not Repeat Yourself).
The workflow scripts, starting with workflow_XX- will create the workflow directories used to run heavy computational jobs on HPC.
Finally, the z-test.R scripts is there as a draft script to test code semi interactively.
Data in the data/ directory
The data/ directory contains data either required or used by the scripts. It is organised as follow:
data/
├── input/
│ ├── params.csv
│ └── scenarios.csv
├── intermediate/
└── output/
- input/ will contain everything that is required by the project prior to
running any R code. Usually, the parameters for the models and the list of
intervention scenarios to be run as part of the analysis. This directory is
checked up by
git(thus saved on GitHub). - intermediate/ is for everything that is created by running the scripts, reused
by other scripts, but not necessary for the final paper. This includes:
networks estimations, calibration artifacts, raw data from the intervention runs.
This folder usually fills up quickly and is NOT checked up by
git. - output/ here are stored the final results of the analysis. This directory is
checked out with
git. It should contains data relevant for anyone wanting to reproduce your analysis and compare their results with yours.