FIt SNE - hasselmonians/knowledge-base GitHub Wiki

t-distributed stochastic neighborhood embedding is a powerful technique for dimensionality reduction and visualization of high-dimensional data sets. It generally performs much better than PCA; however the computational complexity of the algorithm limits it to small data sets (n < 3,000).

A popular implementation of t-SNE uses the Barnes-Hut algorithm to approximate the gradient at each iteration of gradient descent. This implementation, FFT-Accelerated Interpolation-based t-SNE (FIt-SNE) uses the fast Fourier transform to compute the convolution required in the Barnes-Hut algorithm across an interpolated equispaced grid. This dramatically speeds up computation time.

GNU/Linux & MacOS

Clone the repository or download the zip file. The only prerequisite the the fastest Fourier transform in the west (FFTW). It is accessible as a package on Debian- and Arch-based systems and can also be downloaded off the website.

The code needs to be compiled once. In the terminal, run:

g++ -std=c++11 -O3  src/sptree.cpp src/tsne.cpp src/nbodyfft.cpp  -o bin/fast_tsne -pthread -lfftw3 -lm

while in the repository's directory.

Windows

On Windows machines, the researchers have provided a binary. Download and extract to the bin\ folder in the repository. All FFTW dependencies have been included. All downloaded DLL files need to be added to the bin\ directory.

Running FIt-SNE

The code is written in C++. There are wrappers in R, Python, and MATLAB each called fast_tsne. For example, in MATLAB run

embedded_X = fast_tsne(X)

Options are passed as a secondary argument in a struct.