Topic Mapping Pipeline

Export Model Module

The Export Model module gathers the data generated by the Topic Model module to generate concise model data that can be uses by other applications. These data are saved as Topic JSON file(s) and optionally as Document CSV file(s).

The Export Model module is contained in the P3_TopicModelling package.

Specifications

The Export Model module entry in the project file should have the following structure:

{...
  "exportTopicModel": {
    "topics" | "mainTopics": "path",
    "subTopics": "path",
    "documents": "path",
    "docFields": ["key", ... ],
    "output" | "mainOutput": "path",
    "subOutput": "path",
    "mainOutputCSV": "path",
    "subOutputCSV": "path",
    "outputCSV": "path",
    "numWordId": 3
  },
...}

Name	Description	Optional	Default
`topics` or `mainTopics` (if the model is hierarchical)	Path to the (main) topics JSON file *	No
`subTopics`	Path to the sub topics JSON file *	Required if the model is hierarchical	`""` **
`documents`	Path to the documents JSON file *	No
`docFields`	List of keys, in documents' `docData`, to export on file (JSON and CSV) ***	Yes	`[]`
`output` or `mainOutput` (if the model is hierarchical)	Path to the (main) topics JSON file exported ****	Yes	`""` (no export)
`subOutput`	Path to the sub topics JSON file exported ****	Yes	`""` (no export)
`mainOutputCSV`	Path to the document CSV file listing documents and their weights in main topics ****	Yes	`""` (no export)
`subOutputCSV`	Path to the document CSV file listing documents and their weights in sub topics ****	Yes	`""` (no export)
`outputCSV`	Path to the document CSV file listing documents and their weights in both main and sub topics, note that is the model is non-hierarchical this is equivalent to `mainOutputCSV` ****	Yes	`""` (no export)
`numWordId`	Number of labels used to identify topics in document CSV files	Yes	`3`

* These paths are relative to the data directory;
** This default value implies a non-hierarchical model, if the model type meta-parameter is set to hierarchical, a path must be provided;
*** This gets overwritten by the document fields meta-parameter (if set);
**** These paths are relative to the output directory.

Output

The Export Model module outputs multiple files.

First, the topic JSON file, which follows a similar structure to the topic files generated by the Topic Model Modules:

{
  "metadata": { ... },
  "topics": [
    {
      "topicId": "0",
      "topicIndex": 0,
      "topDocs": [{
        "docId": "id", 
        "weight": 0.7778, 
        "docData": {
          "wordCount": 100,
          "key1": "value1",
          "key2": "value2",
          ...
        }
`     }, ... ],
      "topWords": [{"label": "risk", "weight": 85.0}, ... ],
      "subTopicIds": [ ... ],
      "mainTopicIds": [ ... ]
    }, ...
  ],
}

Note that docData has been added to each top document, containing a list of key-value pairs, following the docFields specification, as well as the wordCount for that document.

Then, the document CSV file, if set in the specifications, following this structure:

"_docId", "key1",   "key2",   ..., "_wordCount", "_inModel", "_inferred", "_mainTopic_topic-1-labels", "_mainTopic_topic-2-labels", ...
"0",      "value1", "value2", ..., "107",        "true",     "false",     "0.0197",                    "0.0099",                    ...

Each row represents a document, with key1, key2, etc. being the keys set in docFields. The CSV also includes the wordCount per document, whether the document was included in the model or not, and whether the document was inferred or not (this is only set if at least one document was inferred). Finally, for each topic, identified by a list of their top labels, there is the weight of that topic in the document. Each topic identifier is also annotated with either _mainTopic_ or _subTopic_ to help identify which model they are from.

ExportModule_v2 - Strategic-Futures-Lab/Topic_Mapping_Pipeline GitHub Wiki

Topic Mapping Pipeline

Export Model Module

Specifications

Output

⚠️ GitHub.com Fallback ⚠️

ExportModule_v2 - Strategic-Futures-Lab/Topic_Mapping_Pipeline GitHub Wiki

Topic Mapping Pipeline

Export Model Module

Specifications

Output

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️