The Event Flow in Detail - artsy/snowplow GitHub Wiki
Tracker (embedded JS/Ruby code)
sends requests to http://snowplow-stream-collector.herokuapp.com/i with the snowplow params
Collector (scala app)
input is HTTP get requests to http://snowplow-stream-collector.herokuapp.com/i with the snowplow params
output is the Kinesis stream plasma-production
Enricher (scala app)
input is plasma-production stream on kinesis
app prettifies data, things like geolocating IP addresses
output is SnowplowEnriched stream on kinesis
Storage (java app)
SnowplowEnriched kinesis stream is input
java app collects many events into tsv files containing enriched events on each new line stored on s3 in the artsy-plasma/enrich-out bucket. After this happens a few times...
java app creates a manifest file where each new line is an s3 filename. This manifest file is written to the SnowplowRedshiftManifests kinesis stream
once a buffer is filled... the java app sends a command to Redshift to do a manifest import into the table snowplowtable