Waffle Server: Overview - Gapminder/waffle-server GitHub Wiki

In this section will have a look at waffle-server from 10000 foot view

At the schema above we have multiple elements that collaborate with each other. Let's have a look at them a bit more closely:

AWS:ECS:Cluster

Waffle Server gets deployed to AWS infrastructure using CloudFormation and ECS.

After WS deployment, following entities is present in AWS:

  • HAProxy - load balancer which is responsible for distributing requests among Waffle Server instances.

  • Load Balancing Group of Waffle Server machines - this group process DDFQL requests and produces WS-JSON as a response.

  • Waffle Server Thrashing Machine has two primary responsibilities:

    • Thrashing Machine is the only machine WS-CLI talks to, hence all dataset imports, incremental updates, and dataset configuration get executed in here.
    • This machine warms up Redis cache by executing 500 latest DDFQL requests sent to machines from Load Balancing Group.

    It is worth noting that this machine DOESN'T execute DDFQL requests from the users.

    Thrashing machine shares databases with WS instances from load balancing group.

    Thrashing machine talks to github.com to clone/pull datasets that are supposed to be imported/updated.

  • Redis (provided by ElasticCache) - is a cache store, and it caches all DDFQL requests with their responses.

Cloudflare

Cloudflare is a proxy that provides DDoS protection, SSL, CDN, Caching, DNS. Among other things, with the help of CloudFlare, we have Zero Downtime Deployment and Green-Blue deployment: all we need to do is to deploy a new version of the WS, test it and switch green (prod) and blue (RC) instances by changing IPs in Cloudflare's interface. It is a vital part of WS ecosystem.

All the requests to both: Waffle Server Load Balancer and Waffle Server Thrashing machine come through Cloudflare.

MongoDB Cloud Manager

MongoDB Cloud Manager's primary responsibility is to provide convenient service and UI for monitoring and managing mongo instances: standalone and in replicas. Mentioned functionality gives us the ability to maintain Mongo without pain (upgrade to new versions, do backups, watch after logs, checking DB health, etc.). Cloud Manager is deployable in AWS.

github.com

GitHub is a source of datasets and their changes for Waffle Server.

Tools Page

This is the main WS client, and it talks (using DDFQL) to WS via vizabi-ws-reader.

WS-CLI

This is command line utility that gives the ability to manage datasets and their updates in WS.

WS-CLI among other things validates a dataset before imports and updates.

To update dataset, we need to generate a diff between the current state and the previous one. Diff gets generated on WS side, but WS uses API from WS-CLI to generate it. WS-CLI uses git-csv-diff module in order to provide diff generation functionality. git-csv-diff uses DAFF spec for generating and parsing Diff between 2 csv files.