calibration_multimodal_from_synthetic - laser-base/laser-measles GitHub Wiki

Experiment V2: Two-Phase Spatial Probe for Gravity Mixing Calibration

Date: 2026-04-23
Figure: experiment_v2_summary.png
Scripts: generate_reference_v2.py, calibrate_v2.py, experiment_v2_summary.py

Background and Motivation

Experiment V1 attempted to calibrate a three-parameter spatial measles model — transmission rate β, gravity mixing scale k, and distance exponent c — using an SIA (supplemental immunization activity) in cluster_B as a perturbation to help identify k. The design failed for k: the SIA fired before the epidemic reached cluster_B, so cluster_B's pre-existing immunity was set entirely by the SIA regardless of how much natural cross-cluster transmission had occurred. The loss surface was flat in the k direction (identifiability ratio = 0.000), and the best-fit k was 480% off the true value.

The core diagnosis: k only matters if cross-cluster coupling has had time to leave a trace. An SIA that fires before any infection arrives erases the evidence before it can be observed.

Experimental Design

The V2 design keeps the same scenario — 50 patches split into two clusters of 25, ~2.3 M agents, no vital dynamics — but repositions the SIA to fire mid-invasion.

Timeline

Phase	Ticks	What happens
Burn-in	0 – 199	Fully susceptible population
Phase 1	200 – 265	Import seeds cluster_A (5 largest patches, rate 3/day for 4 days); epidemic spreads within cluster_A and leaks into cluster_B at a rate controlled by k
SIA fires	tick 265	70%-efficacy campaign targets all of cluster_B
Phase 2	266 – end	Within-cluster epidemic in cluster_B drives from the patches already seeded before the SIA; cluster_A continues to burn out

True parameters: β = 0.50, k = 0.050, c = 1.50
SIA efficacy: 70% (raises cluster_B effective immune fraction from ~0% to 70%, leaving R_eff ≈ 1.2 — enough to sustain local spread from seeded patches but not saturating)

The key observable is the patch invasion count: how many of the 25 cluster_B patches have any recorded infection before tick 265. At the true k this is 11/25. The count is sensitive to k across the plausible range: k = 0.001 yields ~0–1 patches invaded; k = 0.05 yields ~11; k = 0.15 yields ~20+. A complementary, coarser signal — cluster_B's cumulative recovered fraction R(t_SIA) — rises monotonically with k and is visible to a deterministic compartmental model, making it useful for Stage 1 calibration.

Reference Simulation

The reference dataset is a single ABM run at true parameters.

Metric	Value
Global peak infectious	tick 299
cluster_A final attack rate	98.0%
cluster_B final attack rate	79.3%
SIA baseline immunity	70%
k-attributable excess AR in cluster_B	+9.3 pp
cluster_B patches invaded before SIA	11 / 25
Inter-cluster arrival lag	41.4 ticks

The 9.3 pp gap between the SIA-floor (70%) and the realized cluster_B attack rate (79.3%) is the integrated signal of cross-cluster coupling during Phase 1. It is small enough to be k-sensitive but large enough to survive the noise of the stochastic reference.

Figure Walk-through

All nine panels are in the figure .

Row 0 — Reference and Design

(0,0) Experiment timeline schematic. The yellow band marks Phase 1 (the k-signal window); the blue band marks Phase 2 (post-SIA within-cluster epidemic). The schematic highlights that the import seeds cluster_A, the gravity model leaks infections into cluster_B during Phase 1, and the SIA interrupts that process at tick 265.
(0,1) Reference ABM global I(t). A single clean epidemic peak at tick 299. Phase 1 is the rising edge; Phase 2 is the declining tail after the SIA removes a large fraction of cluster_B's remaining susceptibles.
(0,2) Cluster-level R(t). cluster_A (blue) saturates near 98%; cluster_B (red) reaches 79.3%, annotated with the 9.3 pp gap above the SIA-floor baseline. The gap is visible as a clear separation between the 70% dashed line and the cluster_B final value.

Row 1 — k Identifiability

(1,0) Per-patch invasion timeline. Each horizontal bar is one cluster_B patch, sorted by its first-infection tick. The 11 patches (red) that were seeded strictly before tick 265 are the Phase-1 k signal. The 14 gray bars represent patches that were first infected by within-cluster spread after the SIA or were never reached. This is the ABM-only observable: the compartmental model cannot resolve individual patch invasions.
(1,1) Loss surface β×k. The 8×8 CMP sweep (fixed c = 1.50) shows a clear, tight minimum near the true k = 0.05 (lime crosshairs). The orange diamond marks the Optuna CMP best (k = 0.052). Contrast with V1, where this panel showed a horizontal stripe with no variation in the k direction.
(1,2) CMP-visible k signal. Extracted from the same sweep: cluster_B's cumulative R at tick 265 as a function of k, at β ≈ 0.50. The monotonic rise from ~60% at k = 0 to ~80%+ at k = 0.20 is the mechanism that gives the CMP leverage on k. Without this signal the loss surface is flat; with it, k acquires a well-defined gradient.

Row 2 — Calibration Results

(2,0) Global I(t) fit. The CMP best (blue dashed) closely tracks the ABM reference (black) in peak height and timing. The ABM best (red dash-dot) is also a reasonable fit globally, though it over-estimates the tail duration.
(2,1) Cluster R(t) fit — CMP. The CMP recovers both cluster curves accurately, including the final 9.3 pp gap. This confirms that the loss function correctly captures the cross-cluster coupling signal.
(2,2) Parameter recovery: V1 vs V2. Bar chart of relative error for each parameter, comparing both experiments. The headline result is the k column: CMP error dropped from +481% (V1) to +4% (V2). ABM k error fell from +184% to +98% — a meaningful improvement but not yet converged. β and c are recovered well in both versions.

Calibration Results

Stage 1 — Compartmental (Optuna, 100 trials)

Parameter	True	Best-fit	Error
β	0.500	0.548	+9.6%
k	0.050	0.052	+3.8%
c	1.500	1.690	+12.6%

Loss: 0.006 (down from ~20.0 in V1 due to all trials hitting invalid parameter space)

Stage 2 — ABM warm-start (Optuna, 40 trials × 3 seeds)

Parameter	True	Best-fit	Error
β	0.500	0.519	+3.7%
k	0.050	0.099	+98%
c	1.500	2.544	+70%

Loss: 0.856

Identifiability Ratios (8×8 CMP sweep)

Parameter	Ratio	Verdict
k	0.836	Well-identified
c	0.910	Well-identified

Discussion

What worked

The V2 redesign fully solves the k identifiability problem at the compartmental level. By positioning the SIA mid-invasion rather than pre-epidemic, the experiment creates a monotonic relationship between k and the cluster_B R(t_SIA) observable. The CMP can detect this relationship and converge to the true k in 100 Optuna trials. This is the surrogate-calibration paradigm in practice: a fast deterministic model navigates parameter space using cluster-level signals, then hands off to the expensive stochastic model for patch-level refinement.

What still needs work

The ABM Stage 2 recovers k to ~99% error — better than V1's 184% but far from the CMP's 4%. Two likely causes:

Noisy invasion count. The patch invasion count (the ABM-only k term) is stochastic: across three seeds with the same parameters, the count varies by ±2–4 patches. With W_INVASION = 3.0 this term dominates the loss but is too noisy to guide gradient descent cleanly at 40 trials. More seeds, more trials, or a smoother k observable would help.
c–k correlation in Phase 2. After the SIA, the within-cluster epidemic in cluster_B involves both within-cluster transmission (influenced by c through patch-to-patch distances) and continued cross-cluster seeding (influenced by k). The ABM loss function may be trading k against c in Phase 2, since both affect the realized cluster_B final AR. Adding a loss term that specifically isolates the Phase-1 period (e.g., cluster_B I(t) for t < SIA_TICK only) could break this correlation.

Design principle

The V1→V2 progression illustrates a general lesson: the experiment must be designed so that the target parameter leaves a trace in the observables before any intervention erases it. The intervention itself (the SIA) is useful precisely because it creates a contrast — but only if it fires after the parameter-dependent process has run long enough to be measurable. Optimal experiment design for spatial calibration should therefore ask: for each parameter, what is the earliest observable signature, and does the intervention window allow that signature to be read?

Files

File	Description
`generate_reference_v2.py`	Runs the reference ABM, saves `abm_reference_v2/`
`calibrate_v2.py`	Identifiability sweep + CMP + ABM calibration
`experiment_v2_summary.py`	Generates `experiment_v2_summary.png`
`experiment_v2_summary.png`	3×3 panel report figure
`sweep_identifiability_v2.png`	10×10 sweep (from calibrate_v2.py)
`calibration_diagnostics_v2.png`	Per-patch AR, arrival histograms, convergence
`abm_reference_v2/`	Reference data: `I_by_patch.npy`, `R_by_patch.npy`, `patch_summary.csv`