UMAP - hasselmonians/knowledge-base GitHub Wiki
Uniform Manifold Approximation and Projection
UMAP is a dimension reduction technique that can be used similarly to t-SNE or PCA, but also for general non-linear dimension reduction. The algorithm is founded on three assumptions about the data:
- The data are uniformly distributed on a Riemannian manifold.
- The Riemannian metric is locally constant (or can be approximated as such).
- The manifold is locally connected.
The details for the underlying mathematics can be found here:
McInnes, L, Healy, J, UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, ArXiv e-prints 1802.03426, 2018
Importantly, UMAP can be used as a drop-in replacement for t-SNE or PCA. t-SNE and UMAP both reflect the underlying structure at a more accurate level. However, t-SNE is stochastic, and therefore adding new data to the dimensionally-reduced representation requires recomputing everything. This is not a problem with UMAP.
Why UMAP?
UMAP is better in basically every way.
Here is a comparison of UMAP to t-SNE and PCA. You can see in the archetypical MNIST handwriting recognition task, that t-SNE and UMAP perform very well, and PCA does not.
When subsets of the data are taken, UMAP also more faithfully maintains the same representation, indicating the stability and veracity of its depiction.
Installing UMAP in MATLAB
A UMAP wrapper has been written by Srinivas Gorur-Shandilya. To use UMAP in MATLAB, you need to do the following:
- Install UMAP using
conda install -c conda-forge umap-learn
- Install
h5py
usingconda install h5py
- Install condalab
- Install umap-matlab-wrapper
- Run
conda.init
to configurecondalab
in MATLAB
These instructions assume that you are using Anaconda to manage your Python installation.
If you are not, you can use pip
or some other tool.
Using UMAP in MATLAB
u = umap();
R = u.fit(x);
Many parameters and options in UMAP are exposed in the object, and you can change these directly in MATLAB.
u = umap
umap with properties:
n_neighbors: 15
n_components: 2
metric: 'euclidean'
learning_rate: 1
min_dist: 0.1000
spread: 1
set_op_mix_ratio: 1
local_connectivity: 1
repulsion_strength: 1
negative_sample_rate: 5
transform_queue_size: 4
target_n_neighbors: -1
target_weight: 0.5000
transform_seed: 42
u.n_neighbors = 10;
u.metric = 'precomputed';