Metadata Configuration - ja-guzzle/guzzle_docs GitHub Wiki

Configurations are as good as source code

We have to mind that the configurations are as good as source code. Those using Guzzle will be setting up tonnes of configuration as part of their Devt phase. All this has to be version control.

Verdict

We are not the first one doing similar frameworks. Gobblin and Airflow uses files. There are few others like Oracle DAC uses tables. ETL tools like ODI and Informatica use tables. The ones like OBIEE uses binary files.

Verdict We will use FILES. yaml (for most of the configurations) and json (for serialzation and some set of configurations). JSON vs YAML: https://www.json2yaml.com/yaml-vs-json

Comparison (just in case we have the dispute)

Following are the pro and cons of keeping metadata configurations in database tables vs files.

Database

Pros

  1. Easy to insert and query
  2. Ensure data types - provides some level of validation during inserts

Cons

  1. Requires roundabout
  2. cant version control (The conf)
  3. Rigid : cant extend the configuration so easily
  4. Multi level needs complicated child/foreign key

Files (json/yaml)

Pros

  1. Simple, ....

Metadata config guidelines

File naming convention

  1. All file names to called as xxx.yaml
  2. All file names preferred to be small. Files names are case sensitive (caps are allowed as well :) )
  3. We can have default location of keeping the config for each module and probably the path of it can be sitting in global config of that module (example: guzzle-ingestion)

Global vs Parent vs Local configuration

  1. There are some configs which are really global at entire guzzle level, guzzle-global.yaml
  2. There are other configs which are global to entire ingestion. guzzle-ingestion.yaml
  3. There are others which are common for all the config in that folder. example: ingestion-common.yaml. Its specific file name for each module. This can contain all the configs which are common across and that we can avoid repetition.
  4. you can also inherit config from other config. This means there should be import properties available and allow specifying relative or absolute path. example: ingestion-system1-common.yaml and import this in individual interface configs to ensure all are getting same treatment. Inheritance can cause nuisance though sometime as it takes superset of configs from both and overriding is at individual config properties and not the whole section
  5. If we can draw the concept of inheriting configuration - this can be complex sometime as to get full of configs that apply has to be um-tot

config file guidelines

  1. All the property values in small
  2. We should agree on things like boolean values like: should it be true/false or Y/N or Yes/No or y and n. Similarly for other similar fixed values like status we need to have some convention
  3. Using underscore vs "." when defining config. I am ok with anything. I like _
  4. Organizing the config well so that its easy to read (or even auto generate). Specifically this configs may be auto generated sometime from excel or some sort of UI
  5. Lets put all the common

Open items

  1. To update the place holder handling in current ingestion config file
⚠️ **GitHub.com Fallback** ⚠️