T distributed Stochastic Neighbor Embedding - AAU-Dat/P5-Nonlinear-Dimensionality-Reduction GitHub Wiki

T-distributed Stochastic Neighbor Embedding(t-sne)

What is it used for?

t-sne is used to show clusters in data in a lover dimension. Here its primarily used to show things like which cells are close to eachother gene wise in a 2-d or 3-d graph. Its a very useful tool to help cluster data for visual representation but less useful as a tool to improve a models acuracy.

What does it do?

t-sne essentially makes it possible descibe data like what can be seen in the first picture better visually.

picture of data which visually looks like 2 rings which are intertwined Here if models like pca was used it would create a blob while t-sne will recreate it a much more productive way like can be seen in picture 2.

picture of 2 detangled rings of data T-sne is able to do this because it tries to reproduce the relative distances between points and the clusters that exist in higher dimensions. Like if 2 points are close in the higher dimensions it makes sure they are closein the lower dimension reproduktion. The distance is relative and not absolute more like the closest 30 points will become the new closest 30 not all the points whithin a absolute distance will be within a new absolute distance in reproduktion.

How does it do it?

It essentially creates atraction between different points based on proximity. This can be visualised very similarly to k nearest neighbors. and everything that isnt atracting is pushing away. These different points are then put into a kind of particle simulation where these points find equailibrium based on these atractions and repulsions. This simulation is a kind of loss function where it optimizes the points position to have maximum atraction and minumum repulsion. This achieved with gradiants(calculus).