Configuring the Clojure collector - winlinvip/snowplow GitHub Wiki

HOME > SNOWPLOW SETUP GUIDE > Step 1: setup a Collector > Clojure collector setup > Enable logging to S3

There are several settings to configure for your Clojure collector application:

Enable SSH access to your Elastic Beanstalk environment and set your instances to have EBS backed root devices
Enable connection draining for your Elastic Beanstalk instance
Configuring autoscaling settings

Enable SSH access to your Elastic Beanstalk environment and set your instances to have EBS backed root devices

Enable SSH access

This will enable you to SSH into one or more of you instances in the unlikely event that the rotation of collector logs from the bucket to S3 fails.

In the Elastic Beanstalk user interface, select your Clojure collector app, select Configuration and then Instances. You should see a dropdown for selecting an EC2 keypair.

Select the keypair you want to be able to SSH into the instances with. Make sure that you keep the key you selected safe, so it is availble should you need it.

Use EBS-backed instances

Now we can update the instance type to use EBS root types. This means that in the unlikely event SSH access does not work, we can snapshot the contents of the instance and retrieve the logs from the snapshot.

From the Root volume type dropdown select General Purpose (SSD)

When you are done, click the Save button.

Enable connection draining for your Elastic Beanstalk instance

Enabling this will enable us to make sure that in the event you want to scale down your cluster, you do not lose any log files generated on machines that will be terminated.

To do this, navigate to the EC2 section of the AWS console adn select Load Balancers from the left hand menu. Select the relevant Load Balancer from the list and click the Edit button in the Connection Draining section to enable this, as in the diagrams below:

Configure autoscaling settings

Now that we've configured our cluster so that in the event of a problem we don't lose any data, we now need to set it up so it gracefully scales up to handle spikes in traffic.

In the Elastic Beanstalk interface, select your Clojure collector app, select Configuration and then Scaling

We recommend setting the following settings:

Auto Scaling

Minimum instance count: 2
Maximum intstance count > 5

Scaling Trigger

Trigger measurement: CPUUtilization
Trigger statistic: Maximum
Unit of measurement: Percent
Measurement period: 5 minutes
Breach duration: 5 minutes
Upper threshold: < 60
Upper breach scale increment: 1
Lower threshold: 5
Lower breach scale increment: 0

Note that the above settings will cause your cluster to scale up to handle a spike in traffic but not to scale down following a drop. The reason is that automatically scaling down runs the risk that you will lose collector logs that have not been rotated to S3 prior to the instance being shut down. In the event that you need to scale a cluster down, we recommend that you do this manually, as described here.

Next: Enable support for HTTPS