Coding Good Practices and Some Tips - EpiModel/EpiModeling GitHub Wiki
Introduction
Before going into some specifics, here are a few common rule one should try to follow when doing any kind of programming.
- Fail Faster: It's better to crash early than to find out far in the future
- Fail Clearer: When your program fails, it should be explicit and clear why
- Code is read way more than it is written: Make the code clear for an external user (you in a week)
- Make it work first, optimize second: when in doubt make the code safe
Coding Base Resources
It is highly recommended to read the R for Data Science book. It will give you a good foundation for your R coding journey.
Naming conventions
Attributes and parameters
Format
always.use.dot.case
- never use "underscore"
_
- only lower case
These are not only good practices. Not respecting these assumption (1 and 2 in particular) can lead to actual errors due to EpiModel expectations.
Suffixes
Attributes and Parameters are often of the same type. We try to indicate what to expect from an attribute and parameter with a set of common suffixes:
- Attributes:
.last
: a timestep where something happened for the last time.count
: the number of time something happened
- Parameters:
.int
: an interval as a number of timesteps.or
: an odds-ratio.prob
: a probability[0, 1]
.rate
: a rate, probability of something happening per timestep[0, 1]
Prefixes
To improve the model clarity, try to use a common prefix for the attributes and parameters referring to the same thing.
gono.
: things related to gonorrheasyph
: things related to syphilis- ...
Common Elements
Finally, we often use the same components over and over. Try to use the commonly used terms to refer to them:
- dx
: diagnosis / diagnosed
- ndx
: not diagnosed
- tx
: treatment / treated
- ntx
: untreated
- inf
: infection / infected
- test
: test (diagnostic test / screening)
- sympt
: symptom / symptomatic
- asympt
: asymptomatic
- ...
Examples
Here are some syphilis attribues and parameters:
- Attributes:
syph.inf
: is the node infected by syphilis? (0 or 1)syph.inf.last
: when did the last syphilis infection occured?syph.inf.count
: number of syphilis infectionssyph.dx
: is the node diagnosed with syphilis? (0 or 1)syph.tx
: is the node treated for syphilis? (0 or 1)
- Parameters:
syph.prob
: probability of getting infected by syphilis per sex actsyph.sympt.tx.prob
: probability of getting treated for syphilis if symptomaticsyph.screen.hivneg.rate
: per timestep probability of getting screened for syphilis if HIV negative
When in doubt, try to mimic the conventions used in the project.
Variables in Modules
dat
sub-elements
When assigning a variable using get_attr
or get_param
, keep the original name of the attribute or parameter.
Inner Variables
All other variable should follow these rules:
snake_case
- only lower case
- never use "dot"
.
This naming distinction allows to easily discriminates what comes from dat
and what has been defined elsewhere.
Similar to attributes and parameters, a set of common suffixes is often used:
_ids
: positional IDs_acts
: positions in the act list_name
: name of something (not the thing)
Attributes Default Values
Theoretically, an attribute can take any scalar value. However, it is easier when these rules are followed for the defautls:
- avoid
NA
as much as possible- someone HIV negative should have the values
0
forhiv.dx
and notNA
- this limits the need to always check for the
NA
s edgecases
- someone HIV negative should have the values
- flags should be
0
or1
, it's rare to have a case whereNA
is usefull - timesteps like
.last
: should be-Inf
by default - it never occurred and is a valid number to do computations
Subsetting the Population
Goal: get the positional IDs of nodes that match a given set of conditions
Example:
- HIV positive nodes
- Diagnosed for their HIV
- Not on PrEP
- Circumsised
- Infected with syphilis
- Treated for their syphilis
Simplest: explicitly list all conditions
elig_ids <- which(
status == 1 &
diag.status == 1 &
prep == 0 &
circ == 1 &
syph.inf == 1 &
syph.tx == 1
)
Optimized: remove redondant conditions
- Being diagnosed with HIV implies HIV infection (in this model)
- PrEP is not possible when diagnosed with HIV (in this model)
- Syphilis treatment implies syphilis infection (in this model
elig_ids <- which(
diag.status == 1 &
circ == 1 &
syph.tx == 1
)
Subset id optimization:
Because syph.tx == 1
is very rare, it's faster to first get these nodes, then keep only the circumsised ones and finally keep only the ones with an HIV+ diagnostic.
elig_ids <- which(syph.tx == 1)
elig_ids <- elig_ids[circ[elig_ids] == 1]
elig_ids <- elig_ids[diag.status[elig_ids] == 1]
This last optimization is only useful when one of the condition is rare (~10%) of the population.
Important Conclusion
As a general rule, always use the simplest version first and optimize later if relevant.
When in doubt about the redundancy of two conditions, keep both.
Never use "common sense", only act if you are sure what is happening within the model