influence function - chunhualiao/public-docs GitHub Wiki

Here’s what that line is saying, piece by piece.

What are influence functions?

In robust statistics, an influence function measures the causal effect of making a tiny change to the data distribution on some estimator. In modern ML (Koh & Liang, 2017), we use it to approximate how much a single training example $z$ would change a test loss (or a test prediction) if we slightly up-weighted that example during training.

The formula

$$ \mathcal I_{\text{up},\text{loss}}(z, z_{\text{test}}) = -,\nabla_{\theta} L(z_{\text{test}}, \hat{\theta})^{\top}; H_{\hat{\theta}}^{-1}; \nabla_{\theta} L(z, \hat{\theta}) $$

What each term means

$z=(x,y)$: a training point.
$z_{\text{test}}$: a test point.
$L(z,\theta)$: per-example loss (e.g., cross-entropy).
$\hat{\theta}$: parameters after training (ERM minimizer).
$\nabla_{\theta} L(z, \hat{\theta})$: gradient of the training example’s loss at $\hat{\theta}$ — the direction that example wants to push the parameters.
$H_{\hat{\theta}}$: the Hessian of the average training loss at $\hat{\theta}$ (i.e., curvature of the objective around the solution).
$H_{\hat{\theta}}^{-1}$: rescales those directions by the local curvature (flat directions are amplified; high-curvature directions are damped).
$\nabla_{\theta} L(z_{\text{test}}, \hat{\theta})^{\top}$: says “how does a small parameter change affect the test loss?”

Interpretation

$\mathcal I_{\text{up},\text{loss}}(z, z_{\text{test}})$ is the first-order change in the test loss at $z_{\text{test}}$ if you up-weight the training point $z$ by an infinitesimal amount during training.
Sign:
- Positive → up-weighting $z$ would increase the test loss at $z_{\text{test}}$ (harmful).
- Negative → it would decrease the test loss (helpful).
Magnitude: strength of that effect.

Why the minus sign? If you up-weight $z$, the ERM optimum $\hat{\theta}$ moves in direction

$$ \frac{d\hat{\theta}}{d\epsilon}\Big|{\epsilon=0} = -,H{\hat{\theta}}^{-1}\nabla_{\theta} L(z,\hat{\theta}). $$

Propagating this parameter change to the test loss via the chain rule gives the formula above.

Common variants

Influence on parameters: $\mathcal I_{\text{up},\theta}(z) = -H_{\hat{\theta}}^{-1}\nabla_{\theta} L(z,\hat{\theta})$.
Approx. leave-one-out effect: removing $z$ changes test loss by about $\frac{1}{n}\nabla_{\theta} L(z_{\text{test}},\hat{\theta})^{\top} H_{\hat{\theta}}^{-1}\nabla_{\theta} L(z,\hat{\theta})$.

Assumptions & practice notes

Loss is twice differentiable; $H_{\hat{\theta}}$ is (locally) invertible/PD.
Deep nets violate strict convexity; people use damping $(H+\lambda I)^{-1}$ and compute Hessian–vector products (CG/LiSSA) to avoid forming $H$.
Works best near the trained optimum and for small perturbations.

Takeaway The score is a fast, first-order causal attribution from training points to test performance: it tells you which training examples help or hurt a given test example (or an entire test set, by summing over $z_{\text{test}}$).