stream enrich - OXYGEN-MARKET/oxygen-market.github.io GitHub Wiki
HOME > SNOWPLOW TECHNICAL DOCUMENTATION > Enrichment > Stream Enrich
Stream Enrich is an Amazon Kinesis app, written in Scala and using the Kinesis Client Library, which:
- Reads raw Snowplow events off a Kinesis stream populated by the Scala Stream Collector
- Validates each raw event
- Enriches each event (e.g. infers the location of the user from his/her IP address)
- Writes the enriched Snowplow event to another Kinesis stream
It is designed to be used downstream of the Scala Stream Collector.
It also supports reading raw events from stdin
and writing enriched events to stdout
, which is useful for debugging.
Stream Enrich utilizes the scala-common-enrich Scala project to enrich events and the SnowplowRawEvent for reading Thrift-serialized objects collected with the Scala Stream Collector.
The result of the enrichment process is a TSV representation of the event breakdown of which is outlined below. For the description of each field, please, refer to the Canonical Event Model.
The application (site, game, app, etc.) this event belongs to, and the tracker platform
-
app_id
: String -
platform
: String
Date/time
-
etl_tstamp
: String -
collector_tstamp
: String -
dvce_created_tstamp
: String
Transaction (i.e. this logging event)
-
event
: String -
event_id
: String -
txn_id
: String
Versioning
-
name_tracker
: String -
v_tracker
: String -
v_collector
: String -
v_etl
: String
User and visit
-
user_id
: String -
user_ipaddress
: String -
user_fingerprint
: String -
domain_userid
: String -
domain_sessionidx
: Integer -
network_userid
: String
Location
-
geo_country
: String -
geo_region
: String -
geo_city
: String -
geo_zipcode
: String -
geo_latitude
: Float -
geo_longitude
: Float -
geo_region_name
: String
Other IP lookups
-
ip_isp
: String -
ip_organization
: String -
ip_domain
: String -
ip_netspeed
: String
Page
-
page_url
: String -
page_title
: String -
page_referrer
: String
Page URL components
-
page_urlscheme
: String -
page_urlhost
: String -
page_urlport
: Integer -
page_urlpath
: String -
page_urlquery
: String -
page_urlfragment
: String
Referrer URL components
-
refr_urlscheme
: String -
refr_urlhost
: String -
refr_urlport
: Integer -
refr_urlpath
: String -
refr_urlquery
: String -
refr_urlfragment
: String
Referrer details
-
refr_medium
: String -
refr_source
: String -
refr_term
: String
Marketing
-
mkt_medium
: String -
mkt_source
: String -
mkt_term
: String -
mkt_content
: String -
mkt_campaign
: String
Custom Contexts
-
contexts
: String
Structured Event
-
se_category
: String -
se_action
: String -
se_label
: String -
se_property
: String -
se_value
: String
Unstructured Event
-
unstruct_event
: String
Ecommerce transaction (from querystring)
-
tr_orderid
: String -
tr_affiliation
: String -
tr_total
: String -
tr_tax
: String -
tr_shipping
: String -
tr_city
: String -
tr_state
: String -
tr_country
: String
Ecommerce transaction item (from querystring)
-
ti_orderid
: String -
ti_sku
: String -
ti_name
: String -
ti_category
: String -
ti_price
: String -
ti_quantity
: String
Page Pings
-
pp_xoffset_min
: Integer -
pp_xoffset_max
: Integer -
pp_yoffset_min
: Integer -
pp_yoffset_max
: Integer
User Agent
-
useragent
: String
Browser (from user-agent)
-
br_name
: String -
br_family
: String -
br_version
: String -
br_type
: String -
br_renderengine
: String
Browser (from querystring)
-
br_lang
: String -
br_features_pdf
: Byte_ -
br_features_flash
: Byte -
br_features_java
: Byte -
br_features_director
: Byte -
br_features_quicktime
: Byte -
br_features_realplayer
: Byte -
br_features_windowsmedia
: Byte -
br_features_gears
: Byte -
br_features_silverlight
: Byte -
br_cookies
: Byte -
br_colordepth
: String -
br_viewwidth
: Integer -
br_viewheight
: Integer
OS (from user-agent)
-
os_name
: String -
os_family
: String -
os_manufacturer
: String -
os_timezone
: String
Device/Hardware (from user-agent)
-
dvce_type
: String -
dvce_ismobile
: Byte
Device (from querystring)
-
dvce_screenwidth
: Integer -
dvce_screenheight
: Integer
Document
-
doc_charset
: String -
doc_width
: Integer -
doc_height
: Integer
Currency
-
tr_currency
: String -
tr_total_base
: String -
tr_tax_base
: String -
tr_shipping_base
: String -
ti_currency
: String -
ti_price_base
: String -
base_currency
: String
Geolocation
-
geo_timezone
: String
Click ID
-
mkt_clickid
: String -
mkt_network
: String
ETL tags
-
etl_tags
: String
Time event was sent
-
dvce_sent_tstamp
: String
Referer
-
refr_domain_userid
: String -
refr_dvce_tstamp
: String
Derived contexts
-
derived_contexts
: String
Session ID
-
domain_sessionid
: String
Derived timestamp
-
derived_tstamp
: String
Derived event vendor/name/format/version
-
event_vendor
: String -
event_name
: String -
event_format
: String -
event_version
: String
Event fingerprint
-
event_fingerprint
: String
True timestamp
-
true_tstamp
: String