Dimansionality reduction and visualization - ProkopHapala/FireCore GitHub Wiki

UMAP

Similarly to UMAP we want to keep distances between points similar. Therefore we minimize metric $$E= \sum_i \sum_j K_{ij}(|\vec x_i|-|\vec x_j| - R_{ij})^2$$
The distances $R_{ij}$ are pre-computed in original high-dimensional space (typically using euclidean metric).
We simply minimize this elastic energy by something like gradient descent, or dynamical (inetrtial) minimization.
NOTE: Initial conditions: The result depend on initial guess of low-dimensional vectors. To reduce this bias we can keep the original high dimensional vectors and gradually apply some elastic constrains in directions with lower variance (squashing them), but that require deeper investigation how to do that. Also doing iterative optimization with high-dimensional vectors is more costly.
We can enforce the fact that preserving distance between nearby points is more impotinat than preserving distances between distant points by using monotoneously decaying function for $K_{ij}=K(R_{ij})$, for example $K(R_{ij})=exp(-\beta R_{ij})$.
If we want to enforce locality and reduce computational cost we can use $K(R_{ij})$ in form of some cutoff function such as Smoothstep.

One problem by which LESM suffer is that points which were originally distant can be pushed together as they do not feel each other (repulsion stiffness $K(R_{ij})$ is zero if R_{ij} is beyond cuttoff radius $R_{CUT}$).
This can be solved by hierachical approach where we first find $k$ representative pivots (e.g. by k-means algorithm). These pivots $p_k$ are optimized first (i.g. miminizing the strain just beween the pivots) with much longer cutoff.
This approach can be seen analogous to Multi-grid method like Fast Multipole Method.

Another problem which remains in LESM is that they do not preserve local directionality. We can solve this by projecting the displacemnents from local pivots $d_i = x_i - p_k$ onto some reduced linear space (locally we can use linear projection) of dimension D.
This reduced $D$-dimensional space si typically computed by Principal Compoent Analysis either form original samples ${x_i}$ or more often from the pivots ${p_k}$ (in that case $D<k$).
- The latter choice means that the displacemts can express toward which sample the vectors is heading.
maybe better way is that each sample keep its distance from several pivots, not just from the closest.