Topic Modelling - SoojungHong/MachineLearning GitHub Wiki

Various different methods for topic modelling have been proposed. Two general approaches are popular:

1.Probabilistic approaches View each document as a mixture of a small number of topics. Words and documents get probability scores for each topic. e.g. Latent Dirichlet Allocation (LDA)(Blei et al, 2003).

2.Matrix factorisation approaches Apply methods from linear algebra to decompose a single matrix (e.g. document-term matrix) into a set of smaller matrices. For text data, we can interpret these as a topic model. e.g. Non-negative Matrix Factorisation (NMF) (Lee & Seung, 1999)

Input : Document-Term matrix (n * m) Output : two matrix, one is document-topic matrix (n * k) and one is topic-terms matrix (k * m)

Reference : http://derekgreene.com/slides/topic-modelling-with-scikitlearn.pdf