Configuration - Archipanion/archipanion-engine-middlelayer GitHub Wiki

Setup the main config

In the main config one has to specify the following parameters

  1. "queryConfigPath": specifies the path to the query catalog settings Sec.
  2. "schemas": For each schema one has to specify the schema name, and a description.
  3. "apiEndpoint": This is the port on which the middleware serves the endpoint.
  4. "engineEndpoints": For each schema one can set the endpoint of an an vitrivr engine instance.
{
  "queryConfigPath": "./queryCatalog/generalSet.json",
  "schemas": [
    {
      "schema": "ptt",
      "description": "The PTT schema. Containing videos and Images from the PTT dataset."
    },
    {
      "schema": "arc",
      "description": "Archipanion schema. For Demo purposes."
    },
    {
      "schema": "sfa",
      "description": "Swiss Federal Archives schema. For Demo purposes."
    }
  ],
  "apiEndpoint": {
    "port": 8085
  },
  "engineEndpoints": [
    {
      "ip": "localhost",
      "schema": "ptt",
      "port": 7070
    },
    {
      "ip": "localhost",
      "schema": "arc",
      "port": 7070
    },
    {
      "ip": "localhost",
      "schema": "arc",
      "port": 7070
    }
  ]
}

Setup the pipeline set

The pipline catalog is the “library index” of the queries. All query pipelines to be used in this desired catalog must be specified here. You must also specify for which schema they are released.

  1. "setName": One yous a set name for loading the configuration.
  2. "queries": For each query one has to specify the query name, description, path and schemas.
    1. "name": This is the name of the pipeline.
    2. "description": Create a clear description on purpose, models / features used and e.g further constraints.
    3. "path": Specify the path to the pipeline file.
    4. "schemas": Add the schema name for each schema the query must be available.
{
  "setName": "generalSet",
  "queries": [
    {
      "name": "clip",
      "description": "This pipeline embeds the provided text, embed it and retrieves based on temporal clip features",
      "path": "./queryCatalog/generalSet/query-clip.json",
      "schemas": ["sfa","ptt","arc"]
    },
    {
      "name": "mlt",
      "description": "This pipeline searches with mlt",
      "path": "./queryCatalog/generalSet/query-mlt.json",
      "schemas": ["ptt","arc"]
    }
  ]
}

Pipeline template example

A pipeline template is the generalized form of a specific pipeline from vitrivr engine. The transformation to the template basically takes place by replacing variable input values with the iterator placeholder %i.

The temporal clip query can serve as an example:

In the classic setting, the fronted would generate a field for each input, i.e. 0-clip, 1-clip ... n-clip, including the data.

,,,
"inputs": {
   "0-clip":{"type":"TEXT","data":"lion"},
   "1-clip":{"type":"TEXT","data":"giraffe"}
},
,,,

The data must be removed from the template, as it is not yet known at the time of the template. Furthermore, the numbering must be replaced by the placeholder %i.

,,,
"inputs": {
   "%i-text": {"type": "TEXT"}
},
,,,

The same applies for the operation pipelines, from the specific pipeline;

"operations":{
   "0-clip":{"type":"RETRIEVER","field":"clip","input":"clip0"},
   "0-lookup":{"type":"TRANSFORMER","transformerName":"FieldLookup","input":"clip0retrieve","properties":{"field":"time","keys":"start, end"}},
   "0-relations":{"type":"TRANSFORMER","transformerName":"RelationExpander","input":"clip0lookup","properties":{"outgoing":"partOf"}},
   "1-retrieve":{"type":"RETRIEVER","field":"clip","input":"clip1"},
   "1-lookup":{"type":"TRANSFORMER","transformerName":"FieldLookup","input":"clip1retrieve","properties":{"field":"time","keys":"start, end"}},
   "1-1relations":{"type":"TRANSFORMER","transformerName":"RelationExpander","input":"clip1lookup","properties":{"outgoing":"partOf"}},
   "temporal":{"type":"AGGREGATOR","aggregatorName":"TemporalSequenceAggregator","inputs":["clip0relations","clip1relations"]},
   "score":{"type":"TRANSFORMER","transformerName":"ScoreAggregator","input":"temporal"},
   "filelookup":{"type":"TRANSFORMER","transformerName":"FieldLookup","input":"score","properties":{"field":"file","keys":"path"}}
},

one gets to the generic tremplate by replacing indices with %i:

,,,
  "operations": {
    "%i-clip" : {"type": "RETRIEVER", "name": "ClipRetriever", "input": "%i-text", "properties": {"field": "clip"}},
    "%i-lookup" : {"type": "TRANSFORMER", "name": "FieldLookup", "input": "%i-clip", "properties": {"field": "time", "keys": "start, end"}},
    "%i-relations" : {"type": "TRANSFORMER", "name": "RelationExpander", "input": "%i-lookup", "properties": {"outgoing": "partOf"}},
    "temporal" : {"type": "AGGREGATOR", "name": "TemporalSequenceAggregator", "input": "%i-relations"},
    "score" : {"type": "TRANSFORMER", "name": "ScoreAggregator",  "input": "temporal"},
    "lookup" : {"type": "TRANSFORMER", "name": "FieldLookup", "input": "score", "properties": {"field": "file", "keys": "path"}}
  },
,,,

If all done correctly one comes up with a template such as:

{
  "name": "Temporal Clip Query",
  "description": "This pipeline takes a number of input texts and query the retrievals for the given temporal sequence.",
  "inputs": {
    "%i-text": {"type": "TEXT"}
  },
  "operations": {
    "%i-clip" : {"type": "RETRIEVER", "name": "ClipRetriever", "input": "%i-text", "properties": {"field": "clip"}},
    "%i-lookup" : {"type": "TRANSFORMER", "name": "FieldLookup", "input": "%i-clip", "properties": {"field": "time", "keys": "start, end"}},
    "%i-relations" : {"type": "TRANSFORMER", "name": "RelationExpander", "input": "%i-lookup", "properties": {"outgoing": "partOf"}},
    "temporal" : {"type": "AGGREGATOR", "name": "TemporalSequenceAggregator", "input": "%i-relations"},
    "score" : {"type": "TRANSFORMER", "name": "ScoreAggregator",  "input": "temporal"},
    "lookup" : {"type": "TRANSFORMER", "name": "FieldLookup", "input": "score", "properties": {"field": "file", "keys": "path"}}
  },
  "context": {
    "global": {
      "limit": "1000"
    },
    "local" : {
      "lookup":{"field": "time", "keys": "start, end"},
      "relations" :{"outgoing": "partOf"},
      "filelookup": {"field": "file", "keys": "path"}
    }
  },
  "output": "lookup"
}

On the query side, (in the UI) one has then only to query e.g:

{"inputs": {
   "1-input": {"type": "TEXT", "data": "Lion"},
   "2-input": {"type": "TEXT", "data": "Giraffe"},
   "3-input": {"type": "TEXT", "data": "Tiger"}
}}