dl metrics - feliyur/exercises GitHub Wiki

KL-Divergence

$$ D_{KL}(P\lvert\rvert Q)=\mathbb{E}{P}\left[ \log\left(\frac{P}{Q}\right)\right] = -\mathbb{E}{P}\left[ \log\left(\frac{Q}{P}\right)\right] = \int_x p(x) \log\left(\frac{p(x)}{q(x)}\right) = \mathbb{E}{P}\left[ \log P \right] - \mathbb{E}{P}\left[ \log Q \right] $$

Wasserstein

Following the Kantorovich-Rubinstein duality - taken from the Wasserstein GAN paper Arjovsky17icml - under the paper assumptions and notations:

$$ \max_{\lvert\rvert f \lvert\rvert_L \leq 1} \mathbb{E}{x\sim \mathbb{P}r}\left[f(x)\right] - \mathbb{E}{x\sim \mathbb{P}\theta}\left[f(x)\right]

\max_{\lvert\rvert f \lvert\rvert_L \leq 1} \mathbb{E}{x\sim \mathbb{P}r}\left[f_w(x)\right] - \mathbb{E}{z\sim \mathbb{P}(z)}\left[f_w(g\theta(z))\right] $$

N-Pairs

Introduced in Sohn16nips. Used in e.g. Pirk19arxiv. Analyzed in a 2020 Hinton paper Hinton20arxiv.

$$ \mathcal{L}{\text{N-pair-mc}} \left( { (x_i, x_i^{\pmb +}) }{i=1}^N; f \right) = \frac{1}{N} \sum_{i=1}^N\log \left( 1 + \sum_{j\neq i} \exp \left( f_i^\top f_j^+ - f_i^\top f_i^+ \right) \right) $$ Where ${ (x_i, x_i^{\pmb +}) }_{i=1}^N$ is the set of matching pairs.

Contrastive Loss

As presented in Sohn16nips. Introduced in Hadsell05cvpr, Hadsell06cvpr.

$$ \mathcal{L}_{\text{cont}}^{m} = \pmb{1}{y_i=y_j}\lvert\lvert f_i - f_j \rvert\rvert_2^2 + \pmb{1}{y_i\neq y_j }\max\left(0, m-\lvert\lvert f_i - f_j \rvert\rvert_2\right)^2 $$

Triplet Loss

As presented in Sohn16nips. Introduced in Weinberger09jmlr.

$$ \mathcal{L}_{\text{tri}}^m(x, x^+,x^-;f)= \max \left( 0, \lvert\lvert f-f^+ \rvert\rvert_2^2 - \lvert\lvert f-f^- \rvert\rvert_2^2 + m \right) $$

Inception Score

Introduced in Salimans16nips. Explanation taken from Barratt18arxiv.

The IS uses an Inception v3 Network pre-trained on ImageNet and calculates a statistic of the network’s outputs when applied to generated images.

$$ IS(G) = \exp\left(\mathbb{E}{x\sim p_g} D{KL}(p(y\mid x)~~\lvert\rvert~~p(y)) \right) $$

Frechet Inception Distance

(from Wikipedia) $$ {\displaystyle {\text{FID}}=|\mu -\mu _{w}|^{2}+\operatorname {tr} (\Sigma +\Sigma _{w}-2(\Sigma \Sigma _{w})^{1/2}).} $$

Where $N(\mu, \Sigma)$ is (a Gaussian approximation to) the distribution of generated images / feature maps, and $N(\mu_w, \Sigma_w)$ is (an approximation to) the distribution of real images / feature maps.

This metric can be computed for images, as well as for computed feature maps along the network (usually deeper ones, close to the output).

Fisher Information Matrix

(from Wikipedia) $$ \mathcal{I}(\theta) = \mathop{\mathbb{E}}{x}\left\lbrace \nabla\theta \log f(X; \theta)\cdot\nabla_\theta \log f(X; \theta)^T \right\rbrace $$ Or, by-element: $$ \left[\mathcal{I}(\theta)\right]{i,j} = \mathop{\mathbb{E}}{x}\left\lbrace \left(\frac{\partial }{\partial \theta_i}\log f(X; \theta)\right)\cdot\left(\frac{\partial }{\partial \theta_j}\log f(X; \theta)\right) \right\rbrace $$