Data Science General Resources - acwooding/dimension_reduction GitHub Wiki
A collection of useful, but not-directly-relevant resources we came across while doing this work:
Ensembles and Interperetability
Ensembles of trees (random forest, gradient boosted), here's a paper that addresses interperetability http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions
They've also done some specific work for feature importance of tree ensembles which is particularly nice: https://arxiv.org/abs/1802.03888
Now that's all well and good but even better there is a lovely little library that they've put together: https://github.com/slundberg/shap
For those of you wanting to read a general article by the package authors instead of the paper: https://towardsdatascience.com/interpretable-machine-learning-with-xgboost-9ec80d148d27
Mappings between different embeddings of language
https://arxiv.org/pdf/1710.04087.pdf
Model based word embeddings
https://arxiv.org/pdf/1710.04087.pdf
word embeddings:
claim: word2vec's objective is Exponential-family PCA on the integer matrix of co-occurrence counts and that the negative sampling estimator for word2vec model parameters is a factorization of the shifted positive PMI
https://www.cs.jhu.edu/~jason/papers/cotterell+al.eacl17.pdf
Variational Inference
"Variational Inference: A Review for Statisticians", https://arxiv.org/pdf/1601.00670.pdf