Useful links - derlin/bda-lsa-project GitHub Wiki
Here are some usefult links related to diverse parts of the developements :
- repo of the source code (from the book): https://github.com/sryza/aas
- wikidumps available for download:
- spark-corenlp: https://github.com/databricks/spark-corenlp
Doc
- LDA with MLLIB:
- LDA with ML:
Articles
- How-to: Tune Your Apache Spark Jobs, useful to run on the daplab
- LDA on Databricks 1, LDA on Databricks 2, LDA on Databricks 3 (using Pipelines)
- Extracting Topics from Tweets and Webpages for the IDEAL Project Word document, report on using LDA for tweet clustering.
- Laymans explanation of topic modeling with LDA, very simple explanation, good vulgarisation !
- Introduction to Latent Dirichlet Allocation
- Quora: why does LDA work ?
- Spark – LDA : A Complete example of clustering algorithm for topic discovery