LDA Output - sameerwadkar/largelda GitHub Wiki

LDA OUTPUT

When the program (in the createmodel model) is executed a folder is created in the $PLDA_WORKING_DIR which is defined in the TopicModeling.properties
A sub-folder is reated under it with the datetime in the following format YYYYMMDDhhmmssSSS. This folder contains all the working files for the model generation
The output of the model is contained under the sub folders "printTopicWords" and "topicByDocs". The first one contains csv files with the iteration number. It is the file which contain $PLDA_NO_OF_WORDS_PER_TOPIC per topic id. The latter contains the topic distribution by document. The format is $DOC_ID,{($TOPIC_ID,$TOPIC_PROBABILITY)}, where multiple probabilities are maintained by topic id.
The actual model is stored in the path $PLDA_MODEL_PATH