Translations import update - Gapminder/waffle-server GitHub Wiki

Translations import

Translations import flow

Translations import process uses datapackage.json for translations loading. Process consists from the following steps:

  1. According to spec translation location is determined by joining lang, language_tag and path from datapackage.json resources, hence lookup path pattern for loader will be ./lang/${language_tag}/${path.from.reource}.

  2. Then translation file is loaded row by row and filtered by data type (concepts, entity, datapoint)

  3. Here we have split point and process differs depending what data type gets updated

    • Concept: translation gets added to the target which is searched using following properties:
      • unique identifier (concept column)
    • Entity: translation gets added to the target which is searched using following properties:
      • path
      • entity identifier(defined by primaryKey)
      • is-- columns (e.g. is--country = true or is--city = false)
    • Datapoint: translation gets added to the target which is searched using following properties:
      • path
      • dimensions
      • indicator

Translations update

WS-CLI generates diff file with removed, updated and created translations for translation target (entity, concept, datapoint) (a.k.a. as translation actions file).

Translations update flow is entirely based in diff file and latest data version from dataset.

Translations update:

1. All the removed translations are removed from translation target.
2. Then new translations creation and update is happening (in parallel).

From here let's have a look at every phase of the update process (remove, create, update).

Translations update flow

Consists from following steps:

  1. Read translation actions file
  2. Filter actions by target type (concept, entity, datapoint)
  3. Filter actions by (remove, update, change, create) type
  4. As a next step we should find translation target in the dataset latest version
    • for a datapoint - using query that consists from: measure, amount of dimensions and dimensions (entities).
    • for an entity - using query that consists from: domain, entity sets , entity identifier and a source file name where entity is located.
    • for a concept - using query that consists from: concept unique identifier.
  5. At this step depending on action:
    • remove - if translation was removed for a target that was already marked as removed in current version - we remove given translation from target. In case if target was updated in current transaction - we remove given translation from target. In case if target was not updated in current transaction - we mark target as removed and create new once with given translation removed from target.
    • update, create - if translation target was created in scope of the current transaction then we just add translation to the target. If target was not touched in scope of the current transaction - we mark target as deleted and create new one with translation added.

There are couple details worth mentioning in this flow:

  1. We always use previous version of datapackage.json when processing removal action and current version of datapackage.json for all the other actions.

  2. Datapoint row in csv might contain couple indicators, for example:

    geo,year,population,popularity_index
    ukr,2017,42000000,42
    

    As we can see in this row we have 2 indicators: population and popularity_index.

    And internally in case of row change, let's say indicator column is removed, we should generate REMOVE action for every removed indicator, cause internally in WS, datapoints are stored on per indicator basis.

    Also it is possible that column removal (e.g. indicator removal) and other changes (e.g. new dimension is added) are happened at the same time. In this case from one update action we should generate removal actions for removed indicators and apply update change by itself (cause it brings new dimension). For example, assuming csv above as an initial state, this state was changed to :

    geo,year,age,population
    ukr,2017,34,42000000
    

    From this example we'll get 1 REMOVE action (for popularity_index indicator) and 1 UPDATE action for new dimension added.

    If only columns are removed from datapoint row, then UPDATE action gets transformed into bunch of remove actions amount of which is calculated based on amount of removed indicators-columns.