Easy reprex for ergm with protected data: the trace function - statnet/computing GitHub Wiki
To debug someone else's ergm
analysis, we often need access to the data they are using, as well as the script. A very neat way to bundle all of this up is the trace()
function. Looking at the documentation for this function, I honestly don't know how Pavel figured out it could be used for this purpose. So here is an example from a model we just debugged:
trace(ergm, quote(save(list=ls(), file="ergm_dump.rda")))
ego_inst <- rm_nas(all_ego_data$inst)
ego_wt_i <- dplyr::left_join(ego_inst$egos, WAwt, by = "ego.id")
ego_inst$egoWt <- ego_wt_i$weight
model_inst <- ego_inst ~ edges +
nodefactor("deg.main", levels = I(0)) +
nodefactor("deg.casl", levels = I(0)) +
nodefactor("risk.grp", levels = -5) +
nodefactor("race", levels = -3) +
nodefactor("region", levels = -2) +
nodematch("race", diff=TRUE) +
absdiff("sqrt.age") +
offset(nodematch("role.class", diff = TRUE, levels = 1:2))
inst_fit <- ergm.ego(model_inst,
offset.coef = rep(-Inf, 2),
control = control.ergm.ego(ppopsize=ppop))
The trace function will quietly let you know it's working:
> trace(ergm, quote(save(list=ls(), file="ergm_dump.rda")))
Tracing function "ergm" in package "ergm"
[1] "ergm"
...
> inst_fit <- ergm.ego(model_inst,
+ offset.coef = rep(-Inf, 2),
+ control = control.ergm.ego(ppopsize=ppop)
+ )
Constructing pseudopopulation network.
Note: Constructed network has size 14410, different from requested 15000. Estimation should not be meaningfully affected.
Tracing ergm(ergm.formula, target.stats = m, offset.coef = ergm.offset.coef, .... on entry
The code generates an Rdata file (ergm_dump.rda
) that contains everything needed to reproduce the fit (which failed). Used in this way, you don't need to turn the trace
function off, it creates the file and it's done. If you re-run the script, it will over-write the file.
The network data, which in this case originates with the restricted ARTnet dataset, is included in the rda
file, not as the original data, but instead as the initialized empty network object of size ppopsize, with the appropriate nodal attributes. This is great, because this empty network is shareable, while the original data are not. The target stats contain the edge info needed for estimation. And where is this network? It is attached to the environment of the formula (as Pavel puts it, "formulas are devious").
> load("ergm_dump.rda")
> ls()
[1] "constraints" "control" "estimate"
[4] "eval.loglik" "formula" "obs.constraints"
[7] "offset.coef" "reference" "response"
[10] "target.stats" "verbose"
> ls(environment(formula))
[1] "popnw"
> environment(formula)$popnw
Network attributes:
vertices = 14410
directed = FALSE
hyper = FALSE
loops = FALSE
multiple = FALSE
bipartite = FALSE
total edges= 0
missing edges= 0
non-missing edges= 0
Vertex attribute names:
.ego.ind age.grp deg.casl deg.main deg.tot diag.status ego.id race rate.oo.part region risk.grp role.class sqrt.age vertex.names
No edge attributes
The rda
object is quite small (for a network of size 14K, the rda
file is < 200KB), and it can be zipped and attached to a GitHub issue, as it is here.
This is just great!