Data lookup - anodot/daria GitHub Wiki

Lookup transformations

Lookup transformations are used to take the original value and find a value that corresponds to it in a predefined mapping that exists somewhere. Currently, only lookup in a CSV file is supported.

Lookups configuration is an object where each key is a lookup name, and a value is an object with lookups configuration.

lookups configuration

Property Required Type Description
type yes String Type of a lookup. Currently, only file lookups are supported
format yes String Format of a lookup file. Currently, only the CSV format is supported
path yes String Path to the file with lookup data

Example of a lookups configuration:

"lookups": {
  "region": {
    "type": "file",
    "format": "CSV",
    "path": "/home/test-datasets/topology/region_1653903037.csv"
  }
}

As an example, there might be a CSV file that contains state abbreviations and full names, like

Abbreviation,Full_name
KS,Kansas
IA,Iowa

If source data contains abbreviations, they can be substituted with a full name using lookups. To do that you will need such a configuration:

{
  "type": "lookup",
  "name": "STATES",
  "key": "Abbreviation",
  "value": "Full_name",
  "compare_function": "equals",
  "default": "other"
}

This lookup will compare the original value, like IA, with values from the Abbreviation column of the states.csv file using the equals function, and if the value is found, will return the corresponding value from the Full_name column. If it wasn't found - the other string will be returned as a default value.

Lookup compare functions

equals - return only values with the exact match

startswith - match to keys that start with a provided value

contains - match to keys that have a provided value as a substring

regex_contains - match a provided value to keys as a regular expression

Additionally, the lookup with the name STATES must be configured under the key lookups at the root of the pipeline configuration (see the example of the pipeline configuration example.