Data Science General Resources - acwooding/dimension_reduction GitHub Wiki

A collection of useful, but not-directly-relevant resources we came across while doing this work:

Ensembles and Interperetability

Ensembles of trees (random forest, gradient boosted), here's a paper that addresses interperetability http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions

They've also done some specific work for feature importance of tree ensembles which is particularly nice: https://arxiv.org/abs/1802.03888

Now that's all well and good but even better there is a lovely little library that they've put together: https://github.com/slundberg/shap

For those of you wanting to read a general article by the package authors instead of the paper: https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27

Mappings between different embeddings of language

https://arxiv.org/pdf/1710.04087.pdf

Model based word embeddings

https://arxiv.org/pdf/1710.04087.pdf

word embeddings:

claim: word2vec's objective is Exponential-family PCA on the integer matrix of co-occurrence counts and that the negative sampling estimator for word2vec model parameters is a factorization of the shifted positive PMI

https://www.cs.jhu.edu/~jason/papers/cotterell+al.eacl17.pdf

Variational Inference

"Variational Inference: A Review for Statisticians", https://arxiv.org/pdf/1601.00670.pdf