Amazon AWS Waffle Server deployment - Gapminder/waffle-server GitHub Wiki

Waffle Server deployment topology and design choices

It is recommended to read design choices document before moving further. This document will help you to understand technologies and terms used below.

Deployment tools

In order to be able to deploy WS you should have following tools installed:

Docker - used by our scripts for creating docker images.
AWS CLI - used by our scripts for uploading docker images and managing ECS clusters.

All deployment scripts for WS are stored in deployment directory and it has following structure:

cf_scripts - contains scripts for creating docker images, and uploading them to Amazon ECS, deploying ECS cluster; contains template file for settings which are required to build and deploy images and cluster.
cloudformation - contains templates for Amazon CloudFormation. This one is used currently for WS production deployment.
haproxy_docker - contains scripts and tools (including Dockerfile) needed for haproxy load balancer setup.
rsys_conf - contains configuration for rsyslog tool that allows us to gather logs from all WS cluster machines in a one place.

Using cf_scripts

For a more detailed info on how to use scripts from cf_scripts folder you can check readme

In cf_scripts you can see:

awsecrops - to manage docker images and upload (remove) them to (from) ECS.
awstackops - to manage ECS Clusters (create, delete them).
SETTINGS - template file that should help you in composing your own settings.

Same SETTINGS file could be used by both: awsecrops and awstackops utilities.

The name of the config file created by you using template is not important.

Settings available for adjustments (* symbol means required)

Amazon AWS - Account

AWS_ACCESS_KEY_ID * - amazon access key id
AWS_SECRET_ACCESS_KEY * - amazon secret access key
AWS_SSH_KEY_NAME * - name of the ssh key generated for you on amazon account

Amazon AWS - ECS and ECS cluster

STACK_NAME * - name of the stack which will be used by ECS.
HAPROXY_ECR_NAME * - name for haproxy docker image. Image with this name will be uploaded to ECS. This image will be used by ECS to create a load balancer machine for the cluster.
NODE_ECR_NAME * - name for WS docker image. Image with this name will be uploaded to ECS. This image will be used for creating Waffle Server machines in the cluster (including Thrashing Machine).
AWS_DEFAULT_REGION - Amazon AWS region in which machines will be deployed. By default: us-east-1
MAX_CLUSTER_INSTANCES - maximum amount of instances that should be in the cluster. Upper bound for upscaling. By default: 10
MIN_CLUSTER_INSTANCES - minimum amount of instances that should be in the cluster. Lower bound for downscaling. By default: 2
REDIS_INSTANCE_TYPE - defines ElasticCache instance type. By default: cache.m4.large
ECS_INSTANCE_TYPE_LB - defines Load Balancer machine instance type. By default: t2.medium
ECS_INSTANCE_TYPE - defines Waffle Server machine instance type. Machine that will serve data to the user. By default: t2.medium
ECS_INSTANCE_TYPE_TM - defines Waffle Server Thrashing Machine instance type. This machine is used for datasets importing, incremental updates and Redis cache warmup. By default: m4.xlarge

Waffle Server Application related

DEFAULT_USER_PASSWORD * - password which will be used by WS default user
MONGODB_URL * - url to the MongoDB instance in a connection string format
NEWRELIC_KEY - NewRelic API key which is used to monitor Waffle Server state. No monitoring is setup by default.
THRASHING_MACHINE - this should have "THRASHING_MACHINE" value and defines whether it should exist in the cluster or not. This flag modifies Waffle Server default behavior and configuration and tunes it to be more WRITE-oriented rather then READ-oriented.
NODE_ENV - environment in which WS is going to be deployed. Possible values are:
- production
- stage
- development (By default)

Production SETTINGS file example

export AWS_ACCESS_KEY_ID=AAAAAAAAAAAAAAAAAAAA
export AWS_SECRET_ACCESS_KEY=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
export NEWRELIC_KEY="BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"
export AWS_SSH_KEY_NAME="john_doe_gapminder"

export AWS_DEFAULT_REGION="eu-west-1"

export HAPROXY_ECR_NAME="ws-prod-hp-ecr"
export NODE_ECR_NAME="ws-prod-ecr"
export STACK_NAME="wsprodstack"

export REDIS_INSTANCE_TYPE="cache.m4.large"

export MIN_CLUSTER_INSTANCES=3
export MAX_CLUSTER_INSTANCES=10

export MONGODB_URL="mongodb://user:[email protected]:27000/waffle_server"
export DEFAULT_USER_PASSWORD=YAHOOO

export NODE_ENV="production"

Development SETTINGS file example

export AWS_ACCESS_KEY_ID=AAAAAAAAAAAAAAAAAAAA
export AWS_SECRET_ACCESS_KEY=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
export NEWRELIC_KEY="BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"
export AWS_SSH_KEY_NAME="john_doe_gapminder"

export AWS_DEFAULT_REGION="eu-west-1"

export HAPROXY_ECR_NAME="ws-dev-hp-ecr-typescript"
export NODE_ECR_NAME="ws-dev-ecr-typescript"
export STACK_NAME="wsdevstack-typescript"

export REDIS_INSTANCE_TYPE="cache.t2.medium"

export MAX_CLUSTER_INSTANCES=3
export MIN_CLUSTER_INSTANCES=1

export MONGODB_URL="mongodb://user:[email protected]:27017/ws_ddf"
export DEFAULT_USER_PASSWORD=YAHOOO

export NODE_ENV="development"

Stage SETTINGS file example

export AWS_ACCESS_KEY_ID=AAAAAAAAAAAAAAAAAAAA
export AWS_SECRET_ACCESS_KEY=AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
export NEWRELIC_KEY="BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB"
export AWS_SSH_KEY_NAME="john_doe_gapminder"

export AWS_DEFAULT_REGION="eu-west-1"

export HAPROXY_ECR_NAME="ws-stage-hp-ecr"
export NODE_ECR_NAME="ws-stage-ecr"
export STACK_NAME="wsstagestack"

export REDIS_INSTANCE_TYPE="cache.t2.small"

export MAX_CLUSTER_INSTANCES=3
export MIN_CLUSTER_INSTANCES=1

export MONGODB_URL="mongodb://user:[email protected]:27017/ws_ddf"
export DEFAULT_USER_PASSWORD=YAHOOO

export NODE_ENV="stage"

Thrashing Machine

This machine has 2 main responsibilities:

DDF dataset importing and incremental update - data import/update are quite expensive in terms of resources (CPU, RAM).
Cache warmup - which is done via bombing WS Thrashing Machine with most popular queries.

These two responsibilities require lots of resources and should not affect user experience. Because of this we've decided to dedicate separate machine to address those responsibilities.

Thrashing Machine is not used for executing DDFQL requests from the users. It is used only by WS-CLI.

Thrashing Machine does share MongoDB database and Redis cache with Waffle Server instances in ECS cluster.

Lets get our hands dirty

So imagine that we've cloned WS repo and are currently in the project root folder, also we've created MY_SETTINGS.txt file which is filled with options described above. Now we have a strong reasons to deploy Waffle Server. To accomplish this you should:

copy ssh secret and public keys (example: dev & dev.pub) to WS root directory
execute ./deployment/cf_scripts/awsecrops buildnpush MY_SETTINGS.txt.
- This will create a docker image from the codebase in the current directory and using its Dockerfile.
- Then once finished - it'll push this image to ECS under the name specified in your MY_SETTINGS.txt.
Once step 1 is finished you should execute ./deployment/cf_scripts/awstackops create_stack MY_SETTINGS.txt. This command will do:
- It'll get STACK_NAME from settings file and send a create stack command to AWS along with other settings required for cluster creation.
Once step 2 is done you should wait until stack is created. Apart from staring at the terminal, alternatively you can observe progress in Amazon CloudFormation view.
Once step 3 is done we should turn our cluster into publicly available setup. For that to accomplish:
- Go to Amazon Dashboard with your EC2 instances. There you should copy two IPs:
  - From instance with the name created from this pattern: ${STACK_NAME}-wsLoadBalancer.
  - From instance with the name created from this pattern: ${STACK_NAME}-wsThrashindMachine.
- Go to CloudFlare (DNS section)
  - For PRODUCTION choose gapminder.org domain and find there
    - waffle-server DNS record - put there IP taken from ${STACK_NAME}-wsLoadBalancer
    - import-waffle-server DNS record - put there IP taken from ${STACK_NAME}-wsThrashindMachine
  - For DEVELOPMENT choose gapminderdev.org domain and find there
    - waffle-server-dev DNS record - put there IP taken from ${STACK_NAME}-wsLoadBalancer
    - import-waffle-server-dev DNS record - put there IP taken from ${STACK_NAME}-wsThrashindMachine
  - For STAGE choose gapminderdev.org domain and find there
    - waffle-server-stage* DNS record - put there IP taken from ${STACK_NAME}-wsLoadBalancer
    - import-waffle-server-stage DNS record - put there IP taken from ${STACK_NAME}-wsThrashindMachine

Amazon ECS cleanup (WARNING: DON'T DELETE STACK UNLESS YOU ARE SURE IT IS NOT USED!)

Name of the CloudFormation stack and Docker images uploaded to ECS are unique, hence eventually you'd want to delete obsolete images and unused stacks and here is how:

In order to delete docker images from ECS you should execute ./deployment/cf_scripts/awsecrops cleanup MY_SETTINGS.txt. Script will get names of the images to remove from HAPROXY_ECR_NAME and NODE_ECR_NAME variables.
In order to delete stack you should execute ./deployment/cf_scripts/awstackops delete_stack MY_SETTINGS.txt. Script will get stack to remove from the STACK_NAME variable.

Waffle Server setup facts

Once one of the machines in the cluster is down - it will be automatically restarted.
In order to access haproxy stats for the machines it uses for load balancing you can access following url http://${IP_TAKEN_FROM_MACHINE_IN_AWS_WHICH_NAME_ENDS_WITH_wsLoadBalancer}:8080/hps?12
Upscalling happening when:
- MemoryReservation >= 90 for 2 consecutive periods of 300 seconds
Downscaling happening when:
- MemoryReservation <= 40 for 2 consecutive periods of 300 seconds
Redis instance will be created separately for each cluster.
MongoDB is managed by MongoDB Cloud Manager and independent from cluster. DEV and STAGE Waffle Server cluster don't use Cloud Manager but rather standalone instances with MongoDB installed there.
Waffle Server traffic is heavily cached by CloudFlare via Page Rules