Continual Resiliency Testing - huit/cloud-boot-camp GitHub Wiki
Description
This small exercise challenges you to develop a solution to test for resiliency (fault-tolerance) and correctness a service deployed in AWS. This will likely use the Simian Army toolset from Netflix. The solution will regularly attempt to damage a running service, and coupled with high quality monitoring will help HUIT develop a string confidence in the service's ability to tolerate faulty components.
The solution will also monitor a deployed application for "wayward" instances, and to ensure that deployed resources conform to a set of predefined standards.
Features of Solution
The solution will be easily deployable, and provides basic testing for a target deployed service (such as a scaled drupal site)
Architectural features
- Use AWS native service where feasible.
- The solution will interact with only the deployed service.
- The solution will be as reusable and generalizable as is reasonably feasible (i.e. don't hardcode stuff about a specific site or platform).
Deployment features
- The solution should be deployed from a single command.
- The solution will be separate from the deployed application, and can be deployed independently.
- Separate build, release, and run stages where possible.
Documentation
- All code and artifacts should be put into a single GitHub repository, unless pulling in code from external sources.
- Document your code layout and build process
- Diagram architecture and process workflows.
- Document how to deploy, update, and destroy the running solution using CloudFormation or similar orchestration.
- Document how to target a specified service, and that service alone, including potential changes needed in the deployment of the target service.
- Ensure that sensitive data and configs are kept separate from code and artifacts, in keeping with 12 Factor standards.