Latent Space Autoregression for Novelty Detection - shubham223601/Anomaly-Detection GitHub Wiki
Referenced from http://arxiv.org/abs/1807.01653
Consist of a deep autoencoder with a density estimator which is used to learn the distribution of a latent representation using auto regressive technique. Aim of training is to maximize remembering patterns for normal sample while minimizing the surprisal(latent vector taking any random distribution) of latent representation.
minimization of surprisal is performed by maximization of likelihood of latent representation using auto regressive estimator. When model is trained to optimize both the above terms together, then model basically looks for minimum entropy representation without any loss of its reconstruction power.
Architecture consist of :
-
Encoder - Used to convert input to the compressed representation(latent representation), It is basically downsampling residual blocks where the last layer is the fully connect layer
-
Decoder - used to reconstruct the compressed representation back to input. It consists of upsampling residual blocks
-
Probabilistic model - estimates the density in latent z, using auto regressive technique, which avoid employing of any distribution which might not be beneficial for the given task
Auto regressive regressive density estimation is performed using below technique:
It is part of general formulation for task with sequential predictions in which each output depends on previous output. This model takes as input a latent vector and outputs cpds for each variable.
estimation of distribution of z is expressed as estimation of each conditional probability density expressed as p(z) = p(Zi|Z<i). estimator h outputs parameter for d distributions p(Zi|Z<i), where each CPD is modeled as multinomial over 100 bins.
all the three components are trained together to minimize the loss. Loss is comprised of reconstruction and log likelihood term. KL divergence is used to minimize the info gap between parameteric model and true distribution produced by encoder such that it is small.Overall the framework lead to minimization of differential entropy of distribution.
Along with the memorization capabilities which is reflected in sense of minimization of reconstruction error, mentioned proposal also considers likelihood of their representation under given prior . Also minimization of differential entropy of latent distribution, increases the discriminative capabilities
Encoder and decoder are basically downsampling and upsampling residual blocks, where in encoder the last layer is a fully connected layer
auto regressive layer - in order to maintain the auto regressive nature for each output of CPD, there needs to be proper connectivity in each layer of estimator h. Encoder provides feature vector with dimension d and auto regressive estimator is composed by stacking multiple masked fully connections