Materials - ItsLastDay/StackOverflow_Map GitHub Wiki

Articles

VP trees: A data structure for finding stuff fast
An article, from which bhtsne's implementation of Vantage Point trees originates;
How to use t-SNE effectively
Article talks about common pitfalls when interpreting t-SNE results:

PCA preserves global structure, while t-SNE aims local structure (nearest neighbours);
Student-t distribution permits us to place dissimilar points farther on the map;
we can use t-SNE to evaluate our machine learning feature design (i.e. features for similar objects are similar);
we can use t-SNE to observe data weaknesses (e.g. denormalization);
matrix factorization is used (in machine learning), because it allows compact representation of data, plus we can use matrix rows as points;
in order to plot co-authorship or synonim data we can use multiple maps t-SNE. The number of maps can be choosed by the value of KL divergence as a function of number of maps;
larger datasets can have perplexity higher than 50.

include runnable examples (and walk-throughs). Jupyter notebooks even allow .js code inside;
README should provide: context for a project, build instructions, limitations, example output;
you should provide a test (small) data set for user to work on, so that he/she is sure the environment is ok;
you should provide explicit dependencies (requirements.txt);
import click - with this you can make CLI interface;
make is ok even for non-C++ commands;
always engineer: nice variable names, separated functions, etc.

src and data folders;
visualization folder inside src;
analysis is a DAG, so make is a good choice;
data is immutable, always include raw data (or at least give a script to obtain it).

Understanding resource timing: how to interpret timings in chrome developer panel. It says that only 6 images can be concurrently downloaded from a single web-server in HTTP/1.1 manner. So we need to do HTTP 2.0;
Guide on how to set up NGINX with http2 support.