Overview

The Passthrough service is a very simple enrichment module that basically just outputs every record received (there are some "advanced" options, see below).

There are a few different scenarios under which it is useful:

Debugging
Batch enrichment pipelines where all the required processing (if any) has been performed in the harvester
As described under batch enrichment, the first element in the pipeline cannot have a grouping field. Therefore if the only "user processing" desired is after the grouping then the pipeline can be:
- PASS -> (group) USER (where USER might be eg the Javascript enrichment module)
The grouping field might just be used to "rebalance" the processing to a desired number of nodes (eg for some reason only one mapper is generated, so it makes sense to reduce across the cluster), in which case the pipeline can be:
- PASS/USER1 -> (group) PASS -> USER2 (PASS/USER1 depending on if there is any pre-processing to be performed)
- or even: PASS/USER1 -> (group) PASS -> PASS -> USER2 (see here for brief discussion on how the batching/performance profile depends on the location of the module(s) with grouping_fields).

Security considerations

The Passthrough service does not execute any user/unsafe code and is therefore safe to allow any user with "write" permissions on a bucket. (And in fact cannot be hidden since it is built into Aleph2 vs being provided via a Shared Library jar).

Logging

Passthrough service contains no additional Aleph2 logging.

Configuration

The enrichment configuration should look like (no module_name_or_id is needed because the Passthrough service is built into Aleph2):

{
   "entry_point": "com.ikanow.aleph2.analytics.services.PassthroughService"
   "config": { ... }
}

No config can be specified, and the Passthrough service will work as advertised.

Alternatively, there are a few "advanced configurations" defined by the following schema inserted into the config field of the enrichment configuration.

{
   "output": string
}

Where "output" is one of:

"$$internal" - this is the default, the module emits every data object it receives
"$$stop" - the module discards every data object it receives.
- (this can be useful in complex Spark or Storm topologies where some paths emit data and some don't, and the paths that end with 3rd party modules that emit, you can just chain one of these to the end)
<bucket path> - the most useful "advanced configuration" - each object is emitted externally (not internally) to the bucket path
- The "$<field>" parameter described below allows users dynamic control over which bucket a data object is routed to.
- Don't forget to set the top level allowed_external_paths when externally emitting data objects.
"$<field>" - the behavior is determined by the contents of each data object's field (nesting via dot notation supported), ie it should have the value "$$internal", "$$stop", or a bucket path as above

Passthrough service - IKANOW/Aleph2 GitHub Wiki

Overview

Security considerations

Logging

Configuration

⚠️ GitHub.com Fallback ⚠️

Passthrough service - IKANOW/Aleph2 GitHub Wiki

Overview

Security considerations

Logging

Configuration

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️