Scala stream collector - OXYGEN-MARKET/oxygen-market.github.io GitHub Wiki

HOME > SNOWPLOW TECHNICAL DOCUMENTATION > Collectors

Introduction

The Scala Stream Collector is a Snowplow event collector for Snowplow, written in Scala. The Scala Stream Collector allows near-real time processing (Enrichment, Storage, Analytics) of a Snowplow raw event stream.

The Scala Stream Collector receives raw Snowplow events over HTTP, serializes them to a Thrift record format, and then writes them to a sink. Currently supported sinks are:

Amazon Kinesis
stdout for a custom stream collection process

Support for Apache Kafka may be added in the future - please see ticket #xxx for details.

Like the Clojure Collector, the Scala Stream Collector supports cross-domain Snowplow deployments, setting a user_id (used to identify unique visitors) server side to reliably identify the same user across domains.

How it works

User identification

The Scala Stream Collector allows the use of a third-party cookie, making user tracking across domains possible. The CloudFront Collector does not support cross domain tracking of users because user ids are set client-side, whereas the Scala Stream Collector sets them server-side.

In a nutshell: the Scala Stream Collector receives events from the Snowplow JavaScript tracker, sets/updates a third-party user tracking cookie, and returns the pixel to the client. The ID in this third-party user tracking cookie is stored in the network_userid field in Snowplow events.

In pseudocode terms:

if (request contains an "sp" cookie) {
    Record that cookie as the user identifier
    Set that cookie with a now+1 year cookie expiry
    Add the headers and payload to the output array
} else {
    Set the "sp" cookie with a now+1 year cookie expiry
    Add the headers and payload to the output array
}

Technical architecture

The Scala Stream Collector is built on top of Spray and Akka actors.

Scala stream collector - OXYGEN-MARKET/oxygen-market.github.io GitHub Wiki

Introduction

How it works

User identification

Technical architecture

See also

⚠️ GitHub.com Fallback ⚠️

Scala stream collector - OXYGEN-MARKET/oxygen-market.github.io GitHub Wiki

Introduction

How it works

User identification

Technical architecture

See also

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️