Stream Enrich - OXYGEN-MARKET/oxygen-market.github.io GitHub Wiki

HOME > SNOWPLOW TECHNICAL DOCUMENTATION > Enrichment > Stream Enrich

Overview

Stream Enrich is an Amazon Kinesis app, written in Scala and using the Kinesis Client Library, which:

  1. Reads raw Snowplow events off a Kinesis stream populated by the Scala Stream Collector
  2. Validates each raw event
  3. Enriches each event (e.g. infers the location of the user from his/her IP address)
  4. Writes the enriched Snowplow event to another Kinesis stream

It is designed to be used downstream of the Scala Stream Collector.

It also supports reading raw events from stdin and writing enriched events to stdout, which is useful for debugging.

Stream Enrich utilizes the scala-common-enrich Scala project to enrich events and the SnowplowRawEvent for reading Thrift-serialized objects collected with the Scala Stream Collector.

Steam Enrich output

The result of the enrichment process is a TSV representation of the event breakdown of which is outlined below. For the description of each field, please, refer to the Canonical Event Model.


The application (site, game, app, etc.) this event belongs to, and the tracker platform


  • app_id: String
  • platform: String

Date/time


  • etl_tstamp: String
  • collector_tstamp: String
  • dvce_created_tstamp: String

Transaction (i.e. this logging event)


  • event: String
  • event_id: String
  • txn_id: String

Versioning


  • name_tracker: String
  • v_tracker: String
  • v_collector: String
  • v_etl: String

User and visit


  • user_id: String
  • user_ipaddress: String
  • user_fingerprint: String
  • domain_userid: String
  • domain_sessionidx: Integer
  • network_userid: String

Location


  • geo_country: String
  • geo_region: String
  • geo_city: String
  • geo_zipcode: String
  • geo_latitude: Float
  • geo_longitude: Float
  • geo_region_name: String

Other IP lookups


  • ip_isp: String
  • ip_organization: String
  • ip_domain: String
  • ip_netspeed: String

Page


  • page_url: String
  • page_title: String
  • page_referrer: String

Page URL components


  • page_urlscheme: String
  • page_urlhost: String
  • page_urlport: Integer
  • page_urlpath: String
  • page_urlquery: String
  • page_urlfragment: String

Referrer URL components


  • refr_urlscheme: String
  • refr_urlhost: String
  • refr_urlport: Integer
  • refr_urlpath: String
  • refr_urlquery: String
  • refr_urlfragment: String

Referrer details


  • refr_medium: String
  • refr_source: String
  • refr_term: String

Marketing


  • mkt_medium: String
  • mkt_source: String
  • mkt_term: String
  • mkt_content: String
  • mkt_campaign: String

Custom Contexts


  • contexts: String

Structured Event


  • se_category: String
  • se_action: String
  • se_label: String
  • se_property: String
  • se_value: String

Unstructured Event


  • unstruct_event: String

Ecommerce transaction (from querystring)


  • tr_orderid: String
  • tr_affiliation: String
  • tr_total: String
  • tr_tax: String
  • tr_shipping: String
  • tr_city: String
  • tr_state: String
  • tr_country: String

Ecommerce transaction item (from querystring)


  • ti_orderid: String
  • ti_sku: String
  • ti_name: String
  • ti_category: String
  • ti_price: String
  • ti_quantity: String

Page Pings


  • pp_xoffset_min: Integer
  • pp_xoffset_max: Integer
  • pp_yoffset_min: Integer
  • pp_yoffset_max: Integer

User Agent


  • useragent: String

Browser (from user-agent)


  • br_name: String
  • br_family: String
  • br_version: String
  • br_type: String
  • br_renderengine: String

Browser (from querystring)


  • br_lang: String
  • br_features_pdf: Byte_
  • br_features_flash: Byte
  • br_features_java: Byte
  • br_features_director: Byte
  • br_features_quicktime: Byte
  • br_features_realplayer: Byte
  • br_features_windowsmedia: Byte
  • br_features_gears: Byte
  • br_features_silverlight: Byte
  • br_cookies: Byte
  • br_colordepth: String
  • br_viewwidth: Integer
  • br_viewheight: Integer

OS (from user-agent)


  • os_name: String
  • os_family: String
  • os_manufacturer: String
  • os_timezone: String

Device/Hardware (from user-agent)


  • dvce_type: String
  • dvce_ismobile: Byte

Device (from querystring)


  • dvce_screenwidth: Integer
  • dvce_screenheight: Integer

Document


  • doc_charset: String
  • doc_width: Integer
  • doc_height: Integer

Currency


  • tr_currency: String
  • tr_total_base: String
  • tr_tax_base: String
  • tr_shipping_base: String
  • ti_currency: String
  • ti_price_base: String
  • base_currency: String

Geolocation


  • geo_timezone: String

Click ID


  • mkt_clickid: String
  • mkt_network: String

ETL tags


  • etl_tags: String

Time event was sent


  • dvce_sent_tstamp: String

Referer


  • refr_domain_userid: String
  • refr_dvce_tstamp: String

Derived contexts


  • derived_contexts: String

Session ID


  • domain_sessionid: String

Derived timestamp


  • derived_tstamp: String

Derived event vendor/name/format/version


  • event_vendor: String
  • event_name: String
  • event_format: String
  • event_version: String

Event fingerprint


  • event_fingerprint: String

True timestamp


  • true_tstamp: String

See also:

⚠️ **GitHub.com Fallback** ⚠️