Study GKMConsistencyDesign - geokrety/geokrety-website GitHub Wiki

This page aim to define GeoKretyMap (GKM) consistency check job: issue#328

A new stateless microservice

GKM Consistency check represent an entity as itself, a brick of service, a microservice, it has it's own life:

  • a new dedicated subproject must be created in geokrety organization (avoid mixing this with the website code)
  • have read-only access to the GeoKrety database.
  • can be written in any language (not necessarily php as the website).
  • configured by environment variables

Configuration

Job configuration is the set of all config/parameters/attributes used as consistency check business logic input:

  • (from geokrety database) current gk-geokrety table
  • a job startup trigger (a cron entry config)
  • job config entries:

config:

  • gkm_api_endpoint: GeoKretyMap API endpoint
  • gkm_export_basic: GeoKretyMap basic export location (example)
  • gkm_consistency_batch_size: a batch size is geokrety select limit
  • gkm_consistency_roll_min_days: min days limit between rolls

Design of a GKM consistency job

Started by job configuration, the goal of a GKM consistency job is to compare a GKM export with all geokrety table entries.

Job definition:

  • a rollId (unique identifier) is defined (cf bellow)
  • a cache of GKM data is produced and stored on redis :
    • reading an xml basic export from GeoKretyMap
    • each geokretymap (geokrety type) entry is stored on redis
  • read of geokrety table is done by one or more batches (depend of data and gkm_consistency_batch_size (as X)):
    • a batch start by using current datetime and a selecting X geokrety order by creation date desc.
    • following batch will use oldest timestamp from result as max datetime
    • this is a end of a roll when a new batch gives no result.
  • each batch will compare X geokrety with related redis GKM state
    • A new log entry is created each time an unsync geokrety is detected
  • at the end of the roll, a new log entry is added with batch result : sum of geokrety analyzed, sum of unsync geokrety

Job throttling

  • rollId and rollEndDate are stored on redis
  • no rollId and no rollEndDate means that we never had a consistency job in the past
  • rollId value starts from 1 the first time and is incremented by one (redis atomic counter)
  • rollEndDate value is -1 when an analysis is in progress
  • rollEndDate value is positive timestamp of the last ended analysis
  • we could state a new job if and only if (rollId is null) or (rollId is set, and rollEndDate is positive and rollEndDate+gkm_consistency_roll_min_days days < now())

Admin point of view

Grafana should include

  • view of compared and unsync geokrety counts over the time

Compare Geokrety with GKM entries

The following geokrety informations will be used to compare gk-geokrety entry with related GKM data:

  • id
  • name
  • ownerName
  • distanceTraveledKm

Job outputs (result)

Each produced logs must embed a tag corresponding to the current business logic, so when applicable

  • rollId
  • geokretyId
  • unsync field(s)

Job metrics

We could define a redis entry per compare result

  • gkm_sync_ok_(id): value is a timestamp of the last succesfull compare
  • gkm_sync_ko_(id): value is a map : first_time => first unsuccesfull compare timestamp, last_time => last unsuccesfull compare timestamp, coun t=> number of unsuccesfull compares, reason=> last unsuccesfull compare result

metrics gauges endpoint provide:

  • gkm_sync_ok_*: number of sync geokrety
  • gkm_sync_ko_*: number of unsync geokrety

Improvements

Centralized datas and logs

We need to design an implement a solution to search over data and/or logs of all geokrety services (application, database, services,...).

Possible candidates are

  • minio github a high performance object storage server compatible with Amazon S3 APIs
  • ELK stack github Elasticsearch, Logstash, Kibana stack