Continual Resiliency Testing - huit/cloud-boot-camp GitHub Wiki

Description

This small exercise challenges you to develop a solution to test for resiliency (fault-tolerance) and correctness a service deployed in AWS. This will likely use the Simian Army toolset from Netflix. The solution will regularly attempt to damage a running service, and coupled with high quality monitoring will help HUIT develop a string confidence in the service's ability to tolerate faulty components.

The solution will also monitor a deployed application for "wayward" instances, and to ensure that deployed resources conform to a set of predefined standards.

Features of Solution

The solution will be easily deployable, and provides basic testing for a target deployed service (such as a scaled drupal site)

Architectural features

  1. Use AWS native service where feasible.
  2. The solution will interact with only the deployed service.
  3. The solution will be as reusable and generalizable as is reasonably feasible (i.e. don't hardcode stuff about a specific site or platform).

Deployment features

  1. The solution should be deployed from a single command.
  2. The solution will be separate from the deployed application, and can be deployed independently.
  3. Separate build, release, and run stages where possible.

Documentation

  1. All code and artifacts should be put into a single GitHub repository, unless pulling in code from external sources.
  2. Document your code layout and build process
  3. Diagram architecture and process workflows.
  4. Document how to deploy, update, and destroy the running solution using CloudFormation or similar orchestration.
  5. Document how to target a specified service, and that service alone, including potential changes needed in the deployment of the target service.
  6. Ensure that sensitive data and configs are kept separate from code and artifacts, in keeping with 12 Factor standards.