Setting up Druid dependencies - chuwy/snowplow-ci GitHub Wiki

HOME > SNOWPLOW SETUP GUIDE > Step 4: setting up alternative data stores > *Setup Druid > Setup Druid for production in AWS > Setup Druid dependencies

The prerequisites for a production setup of Druid in AWS are as follows:

  1. [Amazon S3] amazon-s3 to act as the data repository for Druid ("deep storage")
  2. [Postgres on Amazon RDS] pg-rds to act as the metadata storage for Druid
  3. [Apache ZooKeeper] [zookeeper] to coordinate the Druid clusters

Let's configure/install each of these in turn.

### 1. Amazon S3

We will use [Amazon S3] amazon-s3 as the data repository for Druid ("deep storage").

ADD REST OF SECTION

### 2. Postgres on Amazon RDS

We will use a PostgreSQL instance running on Amazon RDS as the medata storage for Druid.

ADD REST OF SECTION

### 3. Apache ZooKeeper

We will use Apache ZooKeeper as the cluster coordination service for Druid.

Setting up and running a production ZooKeeper cluster is out of the scope of this documentation. We strongly recommend reading [ZooKeeper (O'Reilly)] zookeeper-oreilly before proceeding.

Create a ZooKeeper cluster on EC2, with an odd number of nodes, at least 3. You should not attempt to run any other Druid components on these ZooKeeper nodes.

⚠️ **GitHub.com Fallback** ⚠️