Data Imputation via Variational Inference - KCL-BMEIS/Methods_JournalClub GitHub Wiki

Presented by Liane

8th of June 2019

Unsupervised Data Imputation via Variational Inference of Deep Subspaces

Adrian V. Dalca et al. (2019)

A wide range of systems exhibits high dimensional incomplete data. Accurate estimation of the missing data is often desired and is crucial for many downstream analyses. Many state-of-the-art recovery methods involve supervised learning using datasets containing full observations. In contrast, we focus on the unsupervised estimation of missing image data, where no full observations are available - a common situation in practice. Unsupervised imputation methods for images often employ a simple linear subspace to capture correlations between data dimensions, omitting more complex relationships. In this work, we introduce a general probabilistic model that describes sparse high dimensional imaging data as being generated by a deep non- linear embedding. We derive a learning algorithm using a variational approximation based on convolutional neural networks and discuss its relationship to linear imputation models, the variational autoencoder, and deep image priors. We introduce sparsity-aware network building blocks that explicitly model observed and missing data. We analyze proposed sparsity-aware network building blocks, evaluate our method on public domain imaging datasets, and conclude by showing that our method enables imputation in an important real-world problem involving medical images.

Paper here

Discussion Points

Is it possible to use this approach for estimation of biomarkers instead of images? Yes. Still using deep learning?
Can the decoder and encoder be replaced by kernel functions?
Can it be included in a classification task? Yes.
How to do multiple imputations from different tasks? I.e., different sources of information being used to estimate the missing values.
How much impact has this for the images that should contain some intensity-based features?
Can it be reformulated as a disease progression model?

Pros

Unsupervised imputation of missing data using a general probabilistic model that describes sparse high dimensional imaging data.

Drawbacks

Likelihood is intractable, requires approximations.

Applications to MI field

Synthesis of missing MRI modalities;
Imputation of missing values/voxels -> such as biomarkers.
Particularly useful for small datasets.

Presentation

Presentation here