AWS Lambda Source setup - OXYGEN-MARKET/oxygen-market.github.io GitHub Wiki
HOME » SNOWPLOW SETUP GUIDE » Step 2c: Setup a Data Store Source » AWS Lambda source setup
This tool builds into a single zip file which can be uploaded as a new AWS Lambda function. The description you give your Lambda will be used as your collector endpoint (an URL). This means the Lambda you create will need the ability to perform lambda:GetFunctionConfiguration
on itself, amongst the other normal AWS lambda permissions.
This deployment assumes you have the AWS-CLI, and Python ~2.7 installed.
sudo pip install pyyaml
wget https://bintray.com/artifact/download/snowplow/snowplow-generic/snowplow_aws_lambda_source_0.1.0_bundle.zip
unzip snowplow_aws_lambda_source_0.1.0_bundle.zip -d snowplow_aws_lambda_source_0.1.0_bundle
cd snowplow_aws_lambda_source_0.1.0_bundle
At this point, fire up your favourite editor and edit the file config.yaml
(see below).
This configures your deployment. The configuration file is written in YAML and looks as below:
snowplow:
collector: http://a-collector.com
app_id: your_app_id
s3:
buckets:
- bucket-a
- bucket-b
- bucket-c
The collector
field is the URL of your collector, for example http://com.acme:8080/
. Under s3
the buckets
field is an array of the buckets you wish to monitor.
These buckets must exist prior to run and will not be created by the script.
The app_id
field is the namespace to give your events, for example com.snowplow.s3lambda
.
N.B. bucket names here should not include the s3:// prefix
After you've edited the configuration file, you can deploy the AWS Lambda source and it's required permissions using:
python deploy.py
If you are running the Snowplow batch (Hadoop) flow with Amazon Redshift, then you should deploy the relevant event table into your Amazon Redshift.
You can find the table definition here:
Make sure to deploy this table into the same schema as your events
table.