Data Imputation via Variational Inference - KCL-BMEIS/Methods_JournalClub GitHub Wiki

Presented by Liane

8th of June 2019

Unsupervised Data Imputation via Variational Inference of Deep Subspaces

Adrian V. Dalca et al. (2019)

A wide range of systems exhibits high dimensional incomplete data. Accurate estimation of the missing data is often desired and is crucial for many downstream analyses. Many state-of-the-art recovery methods involve supervised learning using datasets containing full observations. In contrast, we focus on the unsupervised estimation of missing image data, where no full observations are available - a common situation in practice. Unsupervised imputation methods for images often employ a simple linear subspace to capture correlations between data dimensions, omitting more complex relationships. In this work, we introduce a general probabilistic model that describes sparse high dimensional imaging data as being generated by a deep non- linear embedding. We derive a learning algorithm using a variational approximation based on convolutional neural networks and discuss its relationship to linear imputation models, the variational autoencoder, and deep image priors. We introduce sparsity-aware network building blocks that explicitly model observed and missing data. We analyze proposed sparsity-aware network building blocks, evaluate our method on public domain imaging datasets, and conclude by showing that our method enables imputation in an important real-world problem involving medical images.

Paper here


Discussion Points

  • Is it possible to use this approach for estimation of biomarkers instead of images? Yes. Still using deep learning?

  • Can the decoder and encoder be replaced by kernel functions?

  • Can it be included in a classification task? Yes.

  • How to do multiple imputations from different tasks? I.e., different sources of information being used to estimate the missing values.

  • How much impact has this for the images that should contain some intensity-based features?

  • Can it be reformulated as a disease progression model?

Pros

  • Unsupervised imputation of missing data using a general probabilistic model that describes sparse high dimensional imaging data.

Drawbacks

  • Likelihood is intractable, requires approximations.

Applications to MI field

  • Synthesis of missing MRI modalities;
  • Imputation of missing values/voxels -> such as biomarkers.
  • Particularly useful for small datasets.

Presentation

Presentation here