Moving Calibration and Location files into Companion Packages - laser-base/laser-core GitHub Wiki

Problem Statement

It's easy but undesirable to build out our calibration support code and geography files inside the disesase model package itself. For example, having built a laser-leprosy model, in a laser-leprosy repo, and with an installable laser-leprosy package, supposing we want to calibrate the model to some data from Lesotho, it would be tempting to add a calibration/ folder with a bunch of source code and some data files in Lesotho/. But if we grant that we want some separation of concerns of general model from particular location, project, or calibration study, there are options about how exactly to go about that, and the pros and cons of various approaches.

Proposed Solution: Companion Package

We propose here that all the calibration code and geography data files go into their own repository which will appear as a companion package to the disease model.

Rationale

Python isn't designed to support "files in folders (and sub-folders)". It's designed to support installed packages with its import system and pathing. But also we as users want to have "user-space" code (and config and data files) which we can mutate easily without worrying about testing, publishing, packaging, and versioning continuously. That is the tension.

Python has a nice capability in which one can have "companion packages". For example, instead of doing:

pip3 install laser-leprosy

one could do:

pip3 install laser-leprosy[lesotho]

In this case, we will install laser-leprosy as usual, but laser-leprosy also knows how to install the companion package lesotho, which includes the calibration scripts, configs and data for running calibrations of leprosy in Lesotho. We can run calibration by doing:

python3 -m laser-leprosy-lesotho.calibrate

Use Cases

In reality, we want to preserve the current ability to just treat all the code as locally editable. So right now we can do:

git clone https://github.com/InstituteforDiseaseModeling/laser-lepropsy.git

and then

pip3 install -e .

But if we now do:

pip3 install -e ".[lesotho]"

We will get the laser-leprosy-lesotho package installed, but not in an editable checked out repo. To accomplish that, we'll need to git clone both repos:

git clone https://github.com/InstituteforDiseaseModeling/laser-lepropsy.git
git clone https://github.com/InstituteforDiseaseModeling/laser-lepropsy-leprosy.git

And then edit the laser-leprosy/pyproject.toml as follows:

[project.optional-dependencies]
nigeria = ["laser-polio-nigeria @ file://../laser-polio-nigeria"]

Containers

Docker/AKS

In our current workflows, we have a use case in which we want everything we need to run a calibration fully containerized in a docker image. There are no mapped directories of customized input data. The companion package can be installed easily using the previously mentioned install option in the Dockerfile and everything just works.

Singularity/COMPS

In our COMPS workflow, we convert the docker image to a SIF (Singularity Image File). This is fine. But then we assetize and upload the sometimes locally modified calibration scripts/data/configs. We can treat `laser_polio_lesotho/ as a Pure-Python Source Package in Assets. Rather than building and installing a .whl, we can keep our package in editable source form, upload the whole directory as an asset to COMPS, using:

calib_task.common_assets.add_assets(AssetCollection.from_directory("laser-leprosy-lesotho", relative_path="laser-leprosy-lesotho"))

And then set PYTHONPATH to include it...

Add to your WorkOrder/environment:
"Environment": {
  "PYTHONPATH": "$PYTHONPATH:$PWD/Assets/laser-leprosy-lesotho"
}

and we can still run:

python -m laser_polio_nigeria.calibrate