Deployment Pipeline Show and Tell - chef-boneyard/chef-summit-2014 GitHub Wiki

Location

10/3/2014, Metropolitan, 13:30

Convener

Participants

Summary of Discussions

Short presentations of pipeline implementations.

Pipelines

Eric Wolfe - Marshall U.

Diagram of how he deploys. Blog has write up and diagrams. Each service includes a large baseline cookbook which wraps community cookbooks. Services were deployed using functional role cookbooks wrapping community cookbooks. In other words the mail service was deployed with an $org-mail cookbook which includes the $org-baseline cookbook. All locally maintained cookbooks are at least linted. The expectations provided by community cookbooks are integration tested within their included functional role cookbook, such as $org-mail and $org-baseline.

The pipeline was constructed using cookbook dependencies by modeling afferent (upstream) and efferent (downstream) dependencies. The cookbook deploy build job blocks while any afferent (upstream) build jobs are building. The pipeline roughly is constructed with the following flow.

community CBs -> baseline CB -> baseline deploy -> mail CB -> mail deploy to mail env
                                                -> dns CB -> dns deploy to dns env

Each service delivered through the pipeline needed its own Chef environment for isolating dependency calculation. It was discovered that using one all encompassing "prod" environment could result in race conditions where the DNS build job could overwrite the constraints calculated by the Mail build job, and that is very bad. One could alternately try composing a "prod" environment cookbook, however a broken DNS build job could then block changes to the Mail service.

Deployed over 400+ changes with a rate of 0.5-1.0% rate of defect release. Any time a defect had been released, a negative and positive integration test was added to cover regression. Unit tests were added where possible.

Q1. Did he run into cookbook version conflicts? A1. Had problems with the yum cookbook when transitioning from 2.0 to 3.0 (breaking cookbook change). Had to just plug away until 2.0 was removed.

Q2. What about dev environment? A2. Dev environment is ephemeral, and thrown away at the end of integration testing.

Brian Scott - Disney

Going over how cookbooks are tested and deployed. Need to be able to test on servers with/out internet access. Have an internal tool, use jenkins to spin up VMs in diverse environments to test cookbooks. 1 cookbook change may cause 250 environments to spin up for testing. Verify process: foodcritic, custom food critic verify tests are in place, other linters, berks upload.

Q1. How do they deal with deprecating test case environments. How do they reduce the test impact of old environments. A1. Chef working team notifying teams 6-8 months ahead of time that something will not be supported?

Q2. How do you test all the applications that depend on the base cookbooks? A2. They run tests against the whole mess.

Q3. How do cookbooks get to the mini chef orgs? Automated delivery process through “REMI?” A3. Use health inspector gem to check that the versions of the cookbooks are good. Other Disney folks say it may not be happening everywhere.

Q4. How do they deal with versions? A4. “REMI” does a bump of the version.

Trying to get supermarket up as soon as possible.

They have dev orgs separated from prod orgs. Each team has their own methods, lots of diversity.

Q5. If they could start over what would they do different? A5. A lot of what they do is open source.

James - Nordstrom

Future plans, current sucks. Current pipeline diagram.

Phases. Style Unit test Integration test - test kitchen publish phase - git repository, git hub, chef server, public supermarket, private supermarket

Want these repeatable at the local workstation level. Put this in a rake file, Rake style should be what the pipeline runs. Custom chef generator written that sets up the rake file and sample tests. Bundle the gem files too.

Observation - internal to Nordstrom, here’s how to do things. Discussion about trust. Doesn’t deliver straight to prod. Also uses berksfile lock to create what we want.

State of the art evaluates things at a certain time. How do should we deal with updated standards? How to we make the right thing the easy thing.

Things that have to be a certain way shouldn’t be allowed. James disagrees.

Q1. Who would like to track chef coverage in some type of files? A1. We want to be able to track the cookbooks coverage and health in an objective sense.

Q2. How do we test the whole chef infrastructure? A2. Can use chef zero, mock data,

Paul - rackspace

Berkshelf model. Application and cookbook are kept in same repo, go through berks workflow of doing berks apply. Have an open stack app. Pipeline code called Solum.

Q1. Do you use just one environment for every type of service? A1. What do you mean by service?

Q2. Do you just run a bunch of web servers in one monolithic environment, or do you also have backend databases, etc within that environment? A2. We have multiple types of systems in this environment.

Q3. Have you ever encountered a broken build in this one environment that then blocked changes to another service? A3. Yes, just once.

Seth Chisamore - Chef

Build -> Test -> Pub

Build minimal. Test on matrix. Use a custom layout in artefactory. Have two branches current and stable, manual promotion from current to stable. Kicks off ruby scripts and things feed out to public cloud.
Test deep topology matrix, trying to do fully automated testing via chef metal. Current right now is test candidates, will move to almost daily moves with automated promotion. Customers will be able to consume (early) the package out of the git repos.

What will we do now? What needs to happen next?

⚠️ **GitHub.com Fallback** ⚠️