Evaluation - Simsso/NIPS-2018-Adversarial-Vision-Challenge GitHub Wiki
This wiki page lists methods and ideas that can be used to score models with respect to their robustness against adversarial attacks.
Activations
Plotting a histogram of activations and getting a sense for how they behave differently when feeding adversarial examples vs. normal samples.
Plotting FGSM vs. Orthogonal Vector
Plotting the classification in two dimensions, where the first is given by the FGSM attack and the second is orthogonal. A plot of that kind can be seen in the image below. The validation is being done based on what these individual plots look like. Ideally, after some regularization, there would not be these two, distinct halves anymore.
Layerwise Perturbation Graph
The layerwise perturbation graph is an idea taken from last year's winning defense paper (GoogleBrain is also using it). For two inputs x (normal image) and x* (adversarial attack, generated from x), the graph plots the difference of activation outputs for every layer.
The "activation difference" for layer l is defined as
This is a plot from the source:
Our fist experiments on the topic are being tracked in #14
Influence Functions
Paper Understanding Black-box Predictions via Influence Functions. Using influence functions, the authors "trace a model's prediction through the learning algorithm and back to its training data, thereby identifying training points most responsible for a given prediction."
Linear Combinations
Feed linear combinations of two inputs and check whether the classification around the samples is correct. Determine the distance from an image (when linearly approaching an image of another class) of the first miss-classified input. Analyze how noisy the classifications along the line are.
This figure plots classification over linear combination between a "1" and a "0" sample from the training data. Our first experiments can be found here.
Goodfellow is presenting a similar thing here and shows that the classification will work just fine in most directions except for a few. Therefore, the linear combination method might not be efficient in terms of spotting vulnerabilities of a model.