1707.05373 - hassony2/inria-research-wiki GitHub Wiki

Arxiv 2017

[arxiv 1707.05373] Houdini : Fooling Deep Structured Prediction Models [PDF] [notes]

Moustapha Cisse, Yossi Adi, Joseph Keshet

read 08/08/2017

Objective

Adverserial examples should aim at altering the loss of the application directly (pck for pose estimation, IOU for segmentation... ).

Some losses are combinatorial and non-decomposable (cannot be expressed as simple sums over the output units of the network) and can not directly be targeted using gradient-based methods that generate adverserial examples.

==> Houdini tries to fool gradient-based machine learning by tailoring adverserial examples for the task loss of interest

Synthesis

Neural net formalism

g_{\theta}(x, y) is the score assigned by the network given the sample (x, y) and the weights \theta.

The decoder of the network then predicts the output y_hat = argmax_y g_{\theta} (x, y) = y_{theta} (x)

l(y, y_hat) is the task loss, typically combinatorial and hard to optimize

It is replaced by the differentiable Houdini surrogate loss

l_hat(y_\theta(x_hat,) y) = P_{\gamma~N(0,1)} [g_theta(x, y) - g_theta(x, y_hat) < \gamma] * l (y, y_hat)

my intuition (?): the proba term gives a direction to optimize to increase the network score difference, while the task loss term weighs in favor of the degradation of the task loss

The probability part quantifies the probability that the actual target is smaller then the predicted target by some margin

As this probability is smaller then 1, Houdini is a lower bound on the task loss.

When the score assigned by the network to the target y_hat grows without bound, the proba tends to one, and Houdini converges towards the task loss.

Houdini can be derived with respect to g_{\theta} (the output of the network) and by the chain rule with respect to the input of the network.

This gives two terms, derived with respect to g_theta(x, y) and g_theta(x, y_hat).

y is fixed to some label in the case of an untargetted attack and to the target label in the case of a targetted attack

Notes

Not clear : update of y_hat ? at each iteration ?