Setting up the repositories - HopkinsIDD/cholera-mapping-pipeline GitHub Wiki

The data and grid pull components of the cholera mapping pipeline may only be run from the idmodeling server. The below repositories should be set up on your idmodeling server account.

Setting up the repos

Start by pulling cholera-mapping-pipeline:

git clone https://github.com/HopkinsIDD/cholera-mapping-pipeline.git

Currently the pipeline is under active development and branches could present significant differences. At this stage it is recommended to run from branch dev. If modifying the pipeline, it is recommended to create a new dev branch dev_<user initials> and only do pull requests to dev.

Jan 2023: Cloning cholera-covariates should no longer be necessary for installing the pipeline. This step may be skipped.

Once downloaded, clone the repo cholera-covariates into a folder called Layers and lfs pull its content (this may take a while):

git clone https://github.com/HopkinsIDD/cholera-covariates.git cholera-mapping-pipeline/Layers

The file Layers/covariate_dictionary.yml contains the covariate metadata which is used when creating the input to the mapping pipeline. In most cases this repo is only used to read covariates and users should not commit changes to the files in the repo.

If you need to ingest new covariates to the database lfs pull the content of the covariates

cd Layers
git lfs pull

Jan 2023: End Skip

Instead of directly cloning the cholera-covariates repo, we recommend to create a new folder called Layers under cholera-mapping-pipeline folder by using the following code:

cd cholera-mapping-pipeline
mkdir Layers

Download the covariate_dictionary.yml in the repo (https://github.com/HopkinsIDD/cholera-covariates.git) and move it to cholera-mapping-pipeline/Layers

Clone the repo cholera-configs into a folder called Analysis/configs. This repo will hold all YML configuration files and the covariate_dictionary.yml, which contains the covariate metadata which is used when creating the input to the mapping pipeline. Configuration files are organized into folders named after runs. Each country will have a separate config file.

git clone [email protected]:HopkinsIDD/cholera-configs.git cholera-mapping-pipeline/Analysis/configs

Clone the repo cholera-mapping-output-1 into a folder called Analysis/data. This repo will hold rdata files generated by the output (e.g., stan_input, initial_values, stan_output, generated quantities). All files associated with a set of runs will be found in an appropriately named branch. See below for directions if you hare having trouble checking out branches.

git clone [email protected]:HopkinsIDD/cholera-mapping-output-1.git cholera-mapping-pipeline/Analysis/data

Clone the repo cholera-mapping-reports to a folder called Analysis/output that will store logs and diagnostic reports from production runs. These files were originally stored in cholera-configs but they started getting too big (but some reports may still exist in this repository). Instead, we will now create a folder to hold diagnostic reports and logs, named the same as in cholera-configs.

git clone [email protected]:HopkinsIDD/cholera-mapping-reports.git cholera-mapping-pipeline/Analysis/output

Dealing with large repositories

If you have trouble checking out branches in a repo with large file storage (cholera-covariates, cholera-mapping-output-1, cholera-mapping-reports), follow the next steps to resolve this issue.

export GIT_LFS_SKIP_SMUDGE=true
git checkout <branch>
unset GIT_LFS_SKIP_SMUDGE
git lfs pull

Code Style

We have an automated code styler for our repository in a pre-commit file. All code contributors should set up the pre-commit file in their repository.

  1. Find the pinned note in the Slack channel cholera-taxonomy that discusses a "pre-commit" file.
  2. Download the pre-commit file and move it to cholera-mapping-pipeline/.git/hooks.
  3. Make sure that the pre-commit file is executable. In the terminal, type:
chmod 755 cholera-mapping-pipeline/.git/hooks/pre-commit
  1. You may need to add a .gitconfig option for the pre-commit file to work. In the terminal, type:
git config --global hooks.R.Rscript Rscript
  1. You may also need to install the R package formatR.

Congratulations, now any time you make a commit, your code will be automatically styled according to the settings in the pre-commit file.

Installing taxdat

The pipeline relies on the package taxdat. Before installing it be sure to copy the Rscript containing your Cholera Taxonomy and postgres credentials to the package folder:

cp Analysis/R/database_api_key.R packages/taxdat/R (this command is not recommended for using anymore, reasons are below)
R -e "roxygen2::roxygenize('packages/taxdat')"
R CMD INSTALL packages/taxdat

The database_api_key.R script is not required to be within the R package and when it is in there however, it will cause issues when updating the NAMESPACE for the package and eventually lead to unit test failures upon merging.

For idm cluster

srun --nodelist=idmodeling2 --mem=8G -c 4 --time=02:00:00 --pty /opt/R/4.0.3/bin/R

This command will start an interactive R terminal. Then, you should type commands:

setwd("{your_own_path}/cholera-mapping-pipeline/packages/taxdat/")
devtools::document()
install.packages(".", type = "source", repos = NULL)

Replace {your_own_path} according to your own repo path. If everything goes well, the package taxdat should be installed successfully.

Some known issues

Package terra

If you have error about terra installation, such as ubable to collate and parse R files for package terra, then go to the webpage https://cran.r-project.org/src/contrib/Archive/terra/ (using your own machine).

  1. Copy the url address of terra.1.7-39.tar.gz.
  2. In your idm folder, using wget url_for_terra downloads the terra zip file.
  3. In the R terminal, type install.packages("absolute_path_to_terra_zip", type="source", repos=NULL) If you see the package terra is installed, you can try install package taxdat again.

.gitconfig settings

Our repository has contributors working on different platforms (Linux, Mac, Windows), which means that we need to add an option to our .gitconfig to facilitate cross-platform code contributions. Documentation on this issue is here. You will need to edit your ˜/.gitconfig or repository-specific ˜/<repository>/.gitattributes file if you have not already done so for other projects. To edit your global .gitconfig, in the terminal, execute the following lines:

Mac users

$ git config --global core.autocrlf input

Windows users

$ git config --global core.autocrlf true

Linux users

$ git config --global core.autocrlf input
⚠️ **GitHub.com Fallback** ⚠️