ML4EO ‐ MVP - jejjohnson/research_journal_v2 GitHub Wiki

Interpolation

Ideas

Patch-Based GPR
EOF Priors with Probabilistic PCA
GP Priors with Conditional Flows
L2 Data Priors with Conditional Flows
L3 Data Priors with Conditional Flows
Deep Equilibrium Models
Dynamical Emulators

Gaussian Process

The upside of interpolation methods is that they are physically consistent, i.e., things that are closer should be similar. The downside of this is that it isn’t causally consistent. Especially with respect to time. In addition, coordinate-based methods have trouble capturing multiscale activities because this requires many many samples which becomes expensive very fast.

EOF

We use a the classic DINEOF algorithm to perform gap-filling on missing data.

Resources

EUMETSAT Training
Linear Algebra with xarray - einstats
Computational Linear Algebra - FastAI
Randomized SVD - scikit-learn | Cola - TBD | rSVD - JAX
Truncated SVD - scikit-learn
PCA - scikit-learn
Many Matrix Factorization & Decomposition Algorithms - scikit-learn
Scalable Eigenvalue Decomposition on GPU - Cola - Docs
Adjoint Matrix Linear Operator - Cola - Adjoint | Cola - Linear Operator Method
Overview - POD, DMD, etc - libROM

Missing Data Challenges

Losses with masks - MVN - BayesNewton | numpyro
Initialization with missing data
Land-Ocean Masks - xeofs - nan types | xeofs - Sanitizer

Baseline

Classic DINEOF algorithm
Iterative Updates
Scalable Iterative Eigenvalue Decomposition
Covariance Matrix Regularizers (Laplacian)
Equilibrium Model Formulation

Examples

Simplest Example (no fast eigenvalue solver) - tieof
Simple Example (no fast eigenvalue solver) - PyPlume

Stategy

Parameter Estimation w/ Probabilistic PCA
State Estimation w/ PPCA AutoEncoder
Latent State Estimation w/ PPCA Decoder

Field Initialization

Mean
Partial Convolutions - Astropy

Weight Initialization

scikit-learn - PCA

Tutorials - GP

Locality - K-Nearest Neighbours (Unstructured, Semantic) vs Radius Neighbours (Structured)
Weighted Distances - Inverse, RBF
Scale - Algorithm (KD-Tree, Ball-Tree, R-Tree, PyNNDescent)
Scale - Hardware (Parallel CPU, GPU)
Kernel Density Estimation
GPs from scratch
GPs with a PPL
Scale - Algorithm (Subset, Approximate Kernel - inversion, logdet), Hardware (GPU)
Reduced Points - Sparse GP (Fixed vs Variable)
Locality at Scale - Patches - Split-Apply-Combine (Patch Size, Stride, DataLoader, Weighted Stitching)
MegaScale Patching from the cloud
Patches - MegaScale Combination with memory issues
Patches vs Neighbours
Patch-Based - GP & SparseGP Interpolation
Linear Regression
Basis Function - Polynomial, RBF, Spherical Harmonics
Neural Fields

Tutorials

In this tutorial, we will look at feature extractors as a way to fill in the gaps. we will start with the simplest method: PCA/EOFs/POD which is a parametric linear method. We will apply this method to missing data. Afterwards, we will enhance this method by using more non-linear representations.

PCA From Scratch - EOFs Perspective - SVD or Eigs | A Tutorial on the Proper Orthogonal Decomposition
Scalable PCA From Scratch - Jax + GPU + Randomized PCA | Randomized Eigs
PCA w/ scikit-learn - API, Scale
PCA as a minimization problem -
PCA w/ Observation Operator (Missing Data) - DEQ vs BiLevelOpt | Amortization Tutorial
DINEOF w/ Missing Data
AE with missing data - CNN AE Denoiser - Keras | MAE - keras
PPCA from scratch - PPCA w/ EM | has PPCA w/ EM - Tipping | PPCA w/ EM (clear)
PPCA w/ Missing Data - PPCA + EM
PPCA with Numpyro - MLE, MAP, VI, MCMC
state estimation with PPCA AE
latent state estimation with PPCA Decoder
VAE from scratch
VAE generalization - Conditional Flow Model w/ Stochastic Transformations
state estimation w/ Conditional Flow
Latent State Estimation w/ Conditional Flow
Patch-Based Conditional Flow

Tutorials - Dynamical Models

In this tutorial, we will work with dynamical models. We will look at the anatomy of a state space model to understand all of the pieces. Then we will look at the Dynamic Mode Decomposition as the simplest start. Then we will look at some more non-linear structures taking inspiration from Numerical methods.

Anatomy of a State Space Model - Initial Condition, Dynamical Model, Measurement Model
DMD From Scratch - Video
Scaling -> Randomized SVD, Scalable Eigenvalue Decomposition
DMD as a Minimization Problem - OptDMD
DMD w/ Missing Observations
Markovian Gaussian Process
Conditional Flows