Data Imputation via Variational Inference - KCL-BMEIS/Methods_JournalClub GitHub Wiki
Presented by Liane
8th of June 2019
Unsupervised Data Imputation via Variational Inference of Deep Subspaces
Adrian V. Dalca et al. (2019)
A wide range of systems exhibits high dimensional incomplete data. Accurate estimation of the missing data is often desired and is crucial for many downstream analyses. Many state-of-the-art recovery methods involve supervised learning using datasets containing full observations. In contrast, we focus on the unsupervised estimation of missing image data, where no full observations are available - a common situation in practice. Unsupervised imputation methods for images often employ a simple linear subspace to capture correlations between data dimensions, omitting more complex relationships. In this work, we introduce a general probabilistic model that describes sparse high dimensional imaging data as being generated by a deep non- linear embedding. We derive a learning algorithm using a variational approximation based on convolutional neural networks and discuss its relationship to linear imputation models, the variational autoencoder, and deep image priors. We introduce sparsity-aware network building blocks that explicitly model observed and missing data. We analyze proposed sparsity-aware network building blocks, evaluate our method on public domain imaging datasets, and conclude by showing that our method enables imputation in an important real-world problem involving medical images.
Discussion Points
-
Is it possible to use this approach for estimation of biomarkers instead of images? Yes. Still using deep learning?
-
Can the decoder and encoder be replaced by kernel functions?
-
Can it be included in a classification task? Yes.
-
How to do multiple imputations from different tasks? I.e., different sources of information being used to estimate the missing values.
-
How much impact has this for the images that should contain some intensity-based features?
-
Can it be reformulated as a disease progression model?
Pros
- Unsupervised imputation of missing data using a general probabilistic model that describes sparse high dimensional imaging data.
Drawbacks
- Likelihood is intractable, requires approximations.
Applications to MI field
- Synthesis of missing MRI modalities;
- Imputation of missing values/voxels -> such as biomarkers.
- Particularly useful for small datasets.