Micro Libraries wtf_fetch, .... - spencermountain/wtf_wikipedia GitHub Wiki

If we regard wtf as a "Wiki Transformation Framework", in which all submodules/mirco libraries have the module name wtf_my_module, then a wtf node application or WebApp is chaining mirco libraries in a specific order to perform a job.

  • wtf_fetch was already extracted as one main job performed before within `wtf_wikipedia´ (i.e. in v7.3.0).
  • wtf_tokenizer is able to tokenize mathematical expressions and citation and stores the parsed content in a JSON similar to `wtf_wikipedia´. By options you may decide if you want to tokenize citations and/or mathematical expressions. The tokenizer is required for Wiki2Reveal.
  • wtf_url (not implemented yet) transforms e.g. relative links to absolute links or replaces urls for images and other media to absolute links to WikiCommons.

Those processing chain are connected with Promises in the root wtf which is now wtf_wikipedia.

If wtf_url was already implemented for transforming relative links and urls for image, videos and audio to absolute links, then one example processing chain would be:

wtf_fetch > wtf_url > wtf_wikipedia

wtf_fetch downloads the wiki markdown and wtf_url converts relative to absolute links and at the very end of the pipe the converted result wiki is processed with wtf(wiki) with the current npm module wtf_wikipedia. If a developer wants to parse the wiki markdown Parsoid, so the processing chain would look like this:

wtf_fetch > wtf_url > parsoid

If a special wrapper for Wiki Transformation Framework (WTF) would be implemented, that the module would be named as wtf_parsoid and the processing chain for wtf_wiki2html with absolute link transformation would look like this:

wtf_wiki2html := wtf_fetch > wtf_url > wtf_parsoid

Current Structure of wtf_wikipedia

The processing of the library wtf_wikipedia can be split into the following 3 jobs:

  • wtf_fetch, that fetches the wiki source from Wikipedia, Wikiversity, .... (MediaWiki domain) with the parameters language (e.g. en, de,.. ) and domain (e.g. wikipedia, wikiversity, wikivoyage, ...)
  • wtf_parse, that parses wiki source into a Document object (Abstract Syntax Tree)
  • wtf_output, that generates/renders the output for a specific format from a given Document object. The output modes are attached to the tree nodes in the Abstract Syntax Tree (AST). Current tree nodes are defined in the wtf_wikipedia directory src. The order in which they are parsed are indicated in prefix number of the folder name. The tree node have output rendering functions for each available format (plaintext, latex, markdown, html).

The micro library wtf_fetch was extracted from wtf_wikipedia for performing just the cross-fetch download of the wiki source into the browser or NodeJS environment for further processing.

Proposed Mirco Libraries for WTF

The following mirco libraries may be implemented. If you contribute to the Wiki Transformation Framework (WTF) and implement one those methods please replace add a link to the repository to this Wiki document.

  • wtf_fetch, that fetches the wiki source from Wikipedia, Wikiversity, .... (MediaWiki domain) with the parameters language (e.g. en, de,.. ) and domain (e.g. wikipedia, wikiversity, wikivoyage, ...)
  • wtf_wiki2odt converts a wiki markdown source into LibreOffice document. The template ODT file can be loaded with LoadFile4DOM. An ODT file is just a ZIP file with a specific internal folder and file structure. The ODT file can be handle with JSZip even in a browser and the content.xml in the ODT file can be replaced by wtf_wiki2odt. This would allow the client side generation of an LibreOffice document directly from the Wiki source.
  • wtf_book_creator is a client side book creator, that extract links/references in previously downloaded articles with wtf_wikipedia. Categories and links can be used in a Wiki Book Creator to suggest new articles in Wikipedia or Wikiversity to be appended to the client side generated wiki book. Together with wtf_wiki2odt the output of the book could be and LibreOffice file for further editing.
  • wtf_url link, url transformations mirco library for the Wiki Transformation Framework (WTF)
  • wtf_wiki2reveal could be an additional export module for the Wiki Transformation Framework (WTF) (see Wiki2Reveal in Wikiversity for the current rapid prototype as proof of concept)
  • wtf_tokenizer some content elements need special transformations. A tokenizer replaces content elements before wtf_wikipedia parses the wiki markdown. After processing the wiki markdown with wtf_wikipedia then the tokens are replaced by a specifc syntax for the export format (e.g. citations or mathematical expression) in the output format.