The Event Flow in Detail - artsy/snowplow GitHub Wiki

  1. Tracker (embedded JS/Ruby code)
    • sends requests to http://snowplow-stream-collector.herokuapp.com/i with the snowplow params
  2. Collector (scala app)
    • input is HTTP get requests to http://snowplow-stream-collector.herokuapp.com/i with the snowplow params
    • output is the Kinesis stream plasma-production
  3. Enricher (scala app)
    • input is plasma-production stream on kinesis
    • app prettifies data, things like geolocating IP addresses
    • output is SnowplowEnriched stream on kinesis
  4. Storage (java app)
    • SnowplowEnriched kinesis stream is input
    • java app collects many events into tsv files containing enriched events on each new line stored on s3 in the artsy-plasma/enrich-out bucket. After this happens a few times...
    • java app creates a manifest file where each new line is an s3 filename. This manifest file is written to the SnowplowRedshiftManifests kinesis stream
    • once a buffer is filled... the java app sends a command to Redshift to do a manifest import into the table snowplowtable