1 Installing the StorageLoader - OXYGEN-MARKET/oxygen-market.github.io GitHub Wiki
HOME > SNOWPLOW SETUP GUIDE > Step 4: setting up alternative data stores > 1: Installing the StorageLoader
This guide assumes that you have administrator access to a Unix-based server (e.g. Ubuntu, OS X, Fedora) which you can install StorageLoader on.
You might wish to try out the steps showing you how an EC2 instance could be set up via AWS CLI.
The StorageLoader requires Java 7+ to run.
Make sure that if you are loading Snowplow events into a PostgreSQL database, then the StorageLoader must be run on the same server running PostgreSQL. That is because it downloads the files locally, and Postgres needs to be able to ingest the data from the local file system.
StorageLoader moves the Snowplow event files through three distinct S3 buckets during the load process. These buckets are as follows:
- In Bucket - contains the Snowplow event files to process
- Archive Bucket - where StorageLower moves the Snowplow event files after successful loading
The In Bucket for StorageLoader is the same as the Out Bucket for the EmrEtlRunner - i.e. you will already have setup this bucket.
We recommend creating a new folder for the Archive Bucket - i.e. do not re-use EmrEtlRunner's own Archive Bucket. Create the required Archive Bucket in the same AWS region as your In Bucket.
Right, now we can install StorageLoader.
We host StorageLoader on the distribution platform JFrog Bintray. If you completed EmrEtlRunner installation then this step could be skipped. Otherwise you can get a copy of the StorageLoader as shown below.
Note: Please, follow this link if you wish to get a different version of the loader. The distribution name follows the pattern snowplow_emr_{{RELEASE_VERSION}}.zip
.
$ wget http://dl.bintray.com/snowplow/snowplow-generic/snowplow_emr_r88_ankgor_wat.zip
The archive contains both EmrEtlRunner and StorageLoader. Unzip the archive:
$ unzip snowplow_emr_r88_ankgor_wat.zip
You will see two files snowplow-emr-etl-runner
and snowplow-storage-loader
where the second one is the actual StorageLoader.
StorageLoader requires a YAML format configuration and storage targets configs file to run. This should be the same file you use to configure the EmrEtlRunner. See Common configuration more information on its format.
All done? You have the StorageLoader installed! Now find out how to use it.