LDA Output - sameerwadkar/largelda GitHub Wiki
LDA OUTPUT
-
When the program (in the createmodel model) is executed a folder is created in the $PLDA_WORKING_DIR which is defined in the TopicModeling.properties
-
A sub-folder is reated under it with the datetime in the following format YYYYMMDDhhmmssSSS. This folder contains all the working files for the model generation
-
The output of the model is contained under the sub folders "printTopicWords" and "topicByDocs". The first one contains csv files with the iteration number. It is the file which contain $PLDA_NO_OF_WORDS_PER_TOPIC per topic id. The latter contains the topic distribution by document. The format is $DOC_ID,{($TOPIC_ID,$TOPIC_PROBABILITY)}, where multiple probabilities are maintained by topic id.
-
The actual model is stored in the path $PLDA_MODEL_PATH