Easy reprex for ergm with protected data: the trace function - statnet/computing GitHub Wiki

To debug someone else's ergm analysis, we often need access to the data they are using, as well as the script. A very neat way to bundle all of this up is the trace() function. Looking at the documentation for this function, I honestly don't know how Pavel figured out it could be used for this purpose. So here is an example from a model we just debugged:

trace(ergm, quote(save(list=ls(), file="ergm_dump.rda")))

ego_inst <- rm_nas(all_ego_data$inst)
ego_wt_i <- dplyr::left_join(ego_inst$egos, WAwt, by = "ego.id")
ego_inst$egoWt <- ego_wt_i$weight

model_inst <- ego_inst ~ edges + 
  nodefactor("deg.main", levels = I(0)) +
  nodefactor("deg.casl", levels = I(0)) +
  nodefactor("risk.grp", levels = -5) +
  nodefactor("race", levels = -3) + 
  nodefactor("region", levels = -2) +
  nodematch("race", diff=TRUE) +
  absdiff("sqrt.age") +
  offset(nodematch("role.class", diff = TRUE, levels = 1:2))

inst_fit <- ergm.ego(model_inst,
                     offset.coef = rep(-Inf, 2),
                     control = control.ergm.ego(ppopsize=ppop))

The trace function will quietly let you know it's working:

> trace(ergm, quote(save(list=ls(), file="ergm_dump.rda")))
Tracing function "ergm" in package "ergm"
[1] "ergm"
...
> inst_fit <- ergm.ego(model_inst,
+                      offset.coef = rep(-Inf, 2),
+                      control = control.ergm.ego(ppopsize=ppop)
+                      )
Constructing pseudopopulation network.
Note: Constructed network has size 14410, different from requested 15000. Estimation should not be meaningfully affected.
Tracing ergm(ergm.formula, target.stats = m, offset.coef = ergm.offset.coef,  .... on entry

The code generates an Rdata file (ergm_dump.rda) that contains everything needed to reproduce the fit (which failed). Used in this way, you don't need to turn the trace function off, it creates the file and it's done. If you re-run the script, it will over-write the file.

The network data, which in this case originates with the restricted ARTnet dataset, is included in the rda file, not as the original data, but instead as the initialized empty network object of size ppopsize, with the appropriate nodal attributes. This is great, because this empty network is shareable, while the original data are not. The target stats contain the edge info needed for estimation. And where is this network? It is attached to the environment of the formula (as Pavel puts it, "formulas are devious").

> load("ergm_dump.rda")
> ls()
 [1] "constraints"     "control"         "estimate"       
 [4] "eval.loglik"     "formula"         "obs.constraints"
 [7] "offset.coef"     "reference"       "response"       
[10] "target.stats"    "verbose"        
> ls(environment(formula))
[1] "popnw"
> environment(formula)$popnw
 Network attributes:
  vertices = 14410 
  directed = FALSE 
  hyper = FALSE 
  loops = FALSE 
  multiple = FALSE 
  bipartite = FALSE 
  total edges= 0 
    missing edges= 0 
    non-missing edges= 0 

 Vertex attribute names: 
    .ego.ind age.grp deg.casl deg.main deg.tot diag.status ego.id race rate.oo.part region risk.grp role.class sqrt.age vertex.names 

No edge attributes

The rda object is quite small (for a network of size 14K, the rda file is < 200KB), and it can be zipped and attached to a GitHub issue, as it is here.

This is just great!