calibration_multimodal_from_synthetic - laser-base/laser-measles GitHub Wiki

Experiment V2: Two-Phase Spatial Probe for Gravity Mixing Calibration

Date: 2026-04-23
Figure: experiment_v2_summary.png
Scripts: generate_reference_v2.py, calibrate_v2.py, experiment_v2_summary.py


Background and Motivation

Experiment V1 attempted to calibrate a three-parameter spatial measles model — transmission rate β, gravity mixing scale k, and distance exponent c — using an SIA (supplemental immunization activity) in cluster_B as a perturbation to help identify k. The design failed for k: the SIA fired before the epidemic reached cluster_B, so cluster_B's pre-existing immunity was set entirely by the SIA regardless of how much natural cross-cluster transmission had occurred. The loss surface was flat in the k direction (identifiability ratio = 0.000), and the best-fit k was 480% off the true value.

The core diagnosis: k only matters if cross-cluster coupling has had time to leave a trace. An SIA that fires before any infection arrives erases the evidence before it can be observed.


Experimental Design

The V2 design keeps the same scenario — 50 patches split into two clusters of 25, ~2.3 M agents, no vital dynamics — but repositions the SIA to fire mid-invasion.

Timeline

Phase Ticks What happens
Burn-in 0 – 199 Fully susceptible population
Phase 1 200 – 265 Import seeds cluster_A (5 largest patches, rate 3/day for 4 days); epidemic spreads within cluster_A and leaks into cluster_B at a rate controlled by k
SIA fires tick 265 70%-efficacy campaign targets all of cluster_B
Phase 2 266 – end Within-cluster epidemic in cluster_B drives from the patches already seeded before the SIA; cluster_A continues to burn out

True parameters: β = 0.50, k = 0.050, c = 1.50
SIA efficacy: 70% (raises cluster_B effective immune fraction from ~0% to 70%, leaving R_eff ā‰ˆ 1.2 — enough to sustain local spread from seeded patches but not saturating)

The key observable is the patch invasion count: how many of the 25 cluster_B patches have any recorded infection before tick 265. At the true k this is 11/25. The count is sensitive to k across the plausible range: k = 0.001 yields ~0–1 patches invaded; k = 0.05 yields ~11; k = 0.15 yields ~20+. A complementary, coarser signal — cluster_B's cumulative recovered fraction R(t_SIA) — rises monotonically with k and is visible to a deterministic compartmental model, making it useful for Stage 1 calibration.


Reference Simulation

The reference dataset is a single ABM run at true parameters.

Metric Value
Global peak infectious tick 299
cluster_A final attack rate 98.0%
cluster_B final attack rate 79.3%
SIA baseline immunity 70%
k-attributable excess AR in cluster_B +9.3 pp
cluster_B patches invaded before SIA 11 / 25
Inter-cluster arrival lag 41.4 ticks

The 9.3 pp gap between the SIA-floor (70%) and the realized cluster_B attack rate (79.3%) is the integrated signal of cross-cluster coupling during Phase 1. It is small enough to be k-sensitive but large enough to survive the noise of the stochastic reference.


Figure Walk-through

All nine panels are in the figure figure.

Row 0 — Reference and Design

  • (0,0) Experiment timeline schematic. The yellow band marks Phase 1 (the k-signal window); the blue band marks Phase 2 (post-SIA within-cluster epidemic). The schematic highlights that the import seeds cluster_A, the gravity model leaks infections into cluster_B during Phase 1, and the SIA interrupts that process at tick 265.

  • (0,1) Reference ABM global I(t). A single clean epidemic peak at tick 299. Phase 1 is the rising edge; Phase 2 is the declining tail after the SIA removes a large fraction of cluster_B's remaining susceptibles.

  • (0,2) Cluster-level R(t). cluster_A (blue) saturates near 98%; cluster_B (red) reaches 79.3%, annotated with the 9.3 pp gap above the SIA-floor baseline. The gap is visible as a clear separation between the 70% dashed line and the cluster_B final value.

Row 1 — k Identifiability

  • (1,0) Per-patch invasion timeline. Each horizontal bar is one cluster_B patch, sorted by its first-infection tick. The 11 patches (red) that were seeded strictly before tick 265 are the Phase-1 k signal. The 14 gray bars represent patches that were first infected by within-cluster spread after the SIA or were never reached. This is the ABM-only observable: the compartmental model cannot resolve individual patch invasions.

  • (1,1) Loss surface β×k. The 8Ɨ8 CMP sweep (fixed c = 1.50) shows a clear, tight minimum near the true k = 0.05 (lime crosshairs). The orange diamond marks the Optuna CMP best (k = 0.052). Contrast with V1, where this panel showed a horizontal stripe with no variation in the k direction.

  • (1,2) CMP-visible k signal. Extracted from the same sweep: cluster_B's cumulative R at tick 265 as a function of k, at β ā‰ˆ 0.50. The monotonic rise from ~60% at k = 0 to ~80%+ at k = 0.20 is the mechanism that gives the CMP leverage on k. Without this signal the loss surface is flat; with it, k acquires a well-defined gradient.

Row 2 — Calibration Results

  • (2,0) Global I(t) fit. The CMP best (blue dashed) closely tracks the ABM reference (black) in peak height and timing. The ABM best (red dash-dot) is also a reasonable fit globally, though it over-estimates the tail duration.

  • (2,1) Cluster R(t) fit — CMP. The CMP recovers both cluster curves accurately, including the final 9.3 pp gap. This confirms that the loss function correctly captures the cross-cluster coupling signal.

  • (2,2) Parameter recovery: V1 vs V2. Bar chart of relative error for each parameter, comparing both experiments. The headline result is the k column: CMP error dropped from +481% (V1) to +4% (V2). ABM k error fell from +184% to +98% — a meaningful improvement but not yet converged. β and c are recovered well in both versions.


Calibration Results

Stage 1 — Compartmental (Optuna, 100 trials)

Parameter True Best-fit Error
β 0.500 0.548 +9.6%
k 0.050 0.052 +3.8%
c 1.500 1.690 +12.6%

Loss: 0.006 (down from ~20.0 in V1 due to all trials hitting invalid parameter space)

Stage 2 — ABM warm-start (Optuna, 40 trials Ɨ 3 seeds)

Parameter True Best-fit Error
β 0.500 0.519 +3.7%
k 0.050 0.099 +98%
c 1.500 2.544 +70%

Loss: 0.856

Identifiability Ratios (8Ɨ8 CMP sweep)

Parameter Ratio Verdict
k 0.836 Well-identified
c 0.910 Well-identified

Discussion

What worked

The V2 redesign fully solves the k identifiability problem at the compartmental level. By positioning the SIA mid-invasion rather than pre-epidemic, the experiment creates a monotonic relationship between k and the cluster_B R(t_SIA) observable. The CMP can detect this relationship and converge to the true k in 100 Optuna trials. This is the surrogate-calibration paradigm in practice: a fast deterministic model navigates parameter space using cluster-level signals, then hands off to the expensive stochastic model for patch-level refinement.

What still needs work

The ABM Stage 2 recovers k to ~99% error — better than V1's 184% but far from the CMP's 4%. Two likely causes:

  1. Noisy invasion count. The patch invasion count (the ABM-only k term) is stochastic: across three seeds with the same parameters, the count varies by ±2–4 patches. With W_INVASION = 3.0 this term dominates the loss but is too noisy to guide gradient descent cleanly at 40 trials. More seeds, more trials, or a smoother k observable would help.

  2. c–k correlation in Phase 2. After the SIA, the within-cluster epidemic in cluster_B involves both within-cluster transmission (influenced by c through patch-to-patch distances) and continued cross-cluster seeding (influenced by k). The ABM loss function may be trading k against c in Phase 2, since both affect the realized cluster_B final AR. Adding a loss term that specifically isolates the Phase-1 period (e.g., cluster_B I(t) for t < SIA_TICK only) could break this correlation.

Design principle

The V1→V2 progression illustrates a general lesson: the experiment must be designed so that the target parameter leaves a trace in the observables before any intervention erases it. The intervention itself (the SIA) is useful precisely because it creates a contrast — but only if it fires after the parameter-dependent process has run long enough to be measurable. Optimal experiment design for spatial calibration should therefore ask: for each parameter, what is the earliest observable signature, and does the intervention window allow that signature to be read?


Files

File Description
generate_reference_v2.py Runs the reference ABM, saves abm_reference_v2/
calibrate_v2.py Identifiability sweep + CMP + ABM calibration
experiment_v2_summary.py Generates experiment_v2_summary.png
experiment_v2_summary.png 3Ɨ3 panel report figure
sweep_identifiability_v2.png 10Ɨ10 sweep (from calibrate_v2.py)
calibration_diagnostics_v2.png Per-patch AR, arrival histograms, convergence
abm_reference_v2/ Reference data: I_by_patch.npy, R_by_patch.npy, patch_summary.csv