Study GKMConsistencyDesign - geokrety/geokrety-website GitHub Wiki
This page aim to define GeoKretyMap (GKM) consistency check job: issue#328
A new stateless microservice
GKM Consistency check represent an entity as itself, a brick of service, a microservice, it has it's own life:
- a new dedicated subproject must be created in geokrety organization (avoid mixing this with the website code)
- have read-only access to the GeoKrety database.
- can be written in any language (not necessarily php as the website).
- configured by environment variables
Configuration
Job configuration is the set of all config/parameters/attributes used as consistency check business logic input:
- (from geokrety database) current
gk-geokrety
table - a job startup trigger (a cron entry config)
- job
config
entries:
config:
gkm_api_endpoint
: GeoKretyMap API endpointgkm_export_basic
: GeoKretyMap basic export location (example)gkm_consistency_batch_size
: a batch size is geokrety select limitgkm_consistency_roll_min_days
: min days limit between rolls
Design of a GKM consistency job
Started by job configuration, the goal of a GKM consistency job is to compare a GKM export with all geokrety table entries.
Job definition:
- a rollId (unique identifier) is defined (cf bellow)
- a cache of GKM data is produced and stored on redis :
- reading an xml basic export from GeoKretyMap
- each geokretymap (geokrety type) entry is stored on redis
- read of geokrety table is done by one or more batches (depend of data and
gkm_consistency_batch_size
(as X)):- a batch start by using current datetime and a selecting X geokrety order by creation date desc.
- following batch will use oldest timestamp from result as max datetime
- this is a end of a roll when a new batch gives no result.
- each batch will compare X geokrety with related redis GKM state
- A new log entry is created each time an unsync geokrety is detected
- at the end of the roll, a new log entry is added with batch result : sum of geokrety analyzed, sum of unsync geokrety
Job throttling
rollId
androllEndDate
are stored on redis- no
rollId
and norollEndDate
means that we never had a consistency job in the past rollId
value starts from1
the first time and is incremented by one (redis atomic counter)rollEndDate
value is-1
when an analysis is in progressrollEndDate
value is positive timestamp of the last ended analysis- we could state a new job if and only if (rollId is null) or (rollId is set, and rollEndDate is positive and rollEndDate+gkm_consistency_roll_min_days days < now())
Admin point of view
Grafana should include
- view of compared and unsync geokrety counts over the time
Compare Geokrety with GKM entries
The following geokrety informations will be used to compare gk-geokrety
entry with related GKM data:
- id
- name
- ownerName
- distanceTraveledKm
Job outputs (result)
Each produced logs must embed a tag corresponding to the current business logic, so when applicable
- rollId
- geokretyId
- unsync field(s)
Job metrics
We could define a redis entry per compare result
gkm_sync_ok_(id)
: value is a timestamp of the last succesfull comparegkm_sync_ko_(id)
: value is a map : first_time => first unsuccesfull compare timestamp, last_time => last unsuccesfull compare timestamp, coun t=> number of unsuccesfull compares, reason=> last unsuccesfull compare result
metrics gauges endpoint provide:
gkm_sync_ok_*
: number of sync geokretygkm_sync_ko_*
: number of unsync geokrety
Improvements
Centralized datas and logs
We need to design an implement a solution to search over data and/or logs of all geokrety services (application, database, services,...).
Possible candidates are