1 Installing the StorageLoader - OXYGEN-MARKET/oxygen-market.github.io GitHub Wiki

HOME > SNOWPLOW SETUP GUIDE > Step 4: setting up alternative data stores > 1: Installing the StorageLoader

  1. Assumptions
  2. Dependencies
  3. Installation
  4. Configuration
  5. Next steps

1. Assumptions

This guide assumes that you have administrator access to a Unix-based server (e.g. Ubuntu, OS X, Fedora) which you can install StorageLoader on.

You might wish to try out the steps showing you how an EC2 instance could be set up via AWS CLI.

2. Dependencies

2.1 Software

The StorageLoader requires Java 7+ to run.

Make sure that if you are loading Snowplow events into a PostgreSQL database, then the StorageLoader must be run on the same server running PostgreSQL. That is because it downloads the files locally, and Postgres needs to be able to ingest the data from the local file system.

2.2 S3 buckets

StorageLoader moves the Snowplow event files through three distinct S3 buckets during the load process. These buckets are as follows:

  1. In Bucket - contains the Snowplow event files to process
  2. Archive Bucket - where StorageLower moves the Snowplow event files after successful loading

The In Bucket for StorageLoader is the same as the Out Bucket for the EmrEtlRunner - i.e. you will already have setup this bucket.

We recommend creating a new folder for the Archive Bucket - i.e. do not re-use EmrEtlRunner's own Archive Bucket. Create the required Archive Bucket in the same AWS region as your In Bucket.

Right, now we can install StorageLoader.

3. Installation

We host StorageLoader on the distribution platform JFrog Bintray. If you completed EmrEtlRunner installation then this step could be skipped. Otherwise you can get a copy of the StorageLoader as shown below.

Note: Please, follow this link if you wish to get a different version of the loader. The distribution name follows the pattern snowplow_emr_{{RELEASE_VERSION}}.zip.

$ wget http://dl.bintray.com/snowplow/snowplow-generic/snowplow_emr_r88_ankgor_wat.zip

The archive contains both EmrEtlRunner and StorageLoader. Unzip the archive:

$ unzip snowplow_emr_r88_ankgor_wat.zip

You will see two files snowplow-emr-etl-runner and snowplow-storage-loader where the second one is the actual StorageLoader.

4. Configuration

StorageLoader requires a YAML format configuration and storage targets configs file to run. This should be the same file you use to configure the EmrEtlRunner. See Common configuration more information on its format.

5. Next steps

All done? You have the StorageLoader installed! Now find out how to use it.

⚠️ **GitHub.com Fallback** ⚠️