Adding new covariates - HopkinsIDD/cholera-mapping-pipeline GitHub Wiki

This page refers to adding new covariates in the old pipeline.

Creating a config entry

The first step to adding a covariate

Static in Time

Config entry should look like this:

dist_to_lakes:
  name: distance_to_lakes
  description: Distance to closest lake
  alias: dist_lakes
  abbr: dl
  dir: lake_dist/lake_dist_5km_raster.tif
  type: static
  time_aggregator: ~ 
  space_aggregator: average
  transform: ~
  unit: km
  • name : Name of the covariate. Can be used to access covariate from pipeline.
  • description : High level description of covariate. Unused by pipeline.
  • alias : Short name of covariate. Can be used to access covariate from pipeline.
  • abbr : Really short name for covariate. Used to create filenames for pipeline results using this covariate.
  • dir : Path to single geotiff file. Called because it is a directory for temporal files.
  • type : always "static"
  • time_aggregator : always "~"
  • space_aggregator : function to use to aggregate the data of this type across space. Used by the pipeline to aggregate different spatial scales
  • transform : function to use to transform the data. Used by the pipeline to transform the data after aggregation (I think)
  • unit : Units associated with this data. Unused by the pipeline

Variable in Time

pop:
  name: population
  description: population in each gridcell
  alias: pop
  abbr: p
  dir: pop/
  type: temporal
  time_aggregator: average
  res_time: 1 years
  space_aggregator: sum
  transform: ~
  unit: number
  • name : Name of the covariate. Can be used to access covariate from pipeline.
  • description : High level description of covariate. Unused by pipeline.
  • alias : Short name of covariate. Can be used to access covariate from pipeline.
  • abbr : Really short name for covariate. Used to create filenames for pipeline results using this covariate.
  • dir : Directory containing the netcdf files. Historically, we have used 1 netcdf file per time slice. I am not sure if this is required.
  • type : always "temporal"
  • time_aggregator : function to use to aggregate data of this type across time. Used by pipeline for aggregating to different time scales.
  • res_time : The time resolution of the flat files. Used by pipeline for determining gridsize in time.
  • space_aggregator : function to use to aggregate the data of this type across space. Used by the pipeline to aggregate different spatial scales
  • transform : function to use to transform the data. Used by the pipeline to transform the data after aggregation (I think)
  • unit : Units associated with this data. Unused by the pipeline

Converting to netcdf4

Most of the data sets we use are not originally netcdf4 (.nc) files. In order to convert, please use the taxdat::write_netcdf function.