Associativity invariant for master branch build deploy. - benclifford/text GitHub Wiki

Not sure if this has been written in this form. Not sure if associativity is the right term.

In maths: associativity: a + (b + c) = (a + b) + c

As long as we keep the order of things, it doesn't matter how we group the operations. We should get the same result.

Working with git master, we have a bunch of commits (actually not linear, because of merging). For example: A -> B -> C -> D -> E

We can deploy E. But more likely we deployed A, then deployed a later version C, and then deployed a later version E. We would like what we have at the end to be the same, either way. (that's what I'm using the term associativity for).

Deploy means different things in different places, but at least in my experience: run through a caching build system (eg. make); deploy with database migrations.

  • in the build system, bad dependency checking/cache invalidation (the famous CS problem) - need to "make clean" to make something build, because it builds from clean but not with previous version's detritus around. More awkwardly, I've seen a sequence of commits with GHC JS A -> B -> C -> D where building A then D fails but building A, B, C, D in sequence works. Sometimes a make clean isn't even enough, when erroneous state is stored more globally (see stack #2365).

  • fiddling with database migrations - something to do with storing diffs in version control, rather than storing the levels and letting deployment compute the diffs from version control. (which is hard because migrations can do more than adjusting schema). For example if you're fiddling with migrations to make them work, on what you regard as a dev branch and then merge the whole history of that dev branch into master: now you're fiddling with migrations on master. An argument for squash merges. Once your migration has hit master, you can't edit it because it has been applied already. But the version control system doesn't stop you doing that. (I've had vaguely plausible reasons to do edits in the past, but not particularly nice onces: for example, someone applied a migration by hand rather than through the migration system and now we want to record what was done: first make a no-op migration, let it be applied, and then edit it to contain what was done manually - this won't get executed because it's recorded as applied). Automatic schema migration stuff that I've used (with groovy) has been ickily unreliable in the past; and also can't cope with non-schema migrations (for example, data conversions)

  • in the deploy system which deploys .deb packages into sort-of-transient docker containers, we can do two things: i) upgrade the .deb using apt, and ii) create a new transient container and install the .deb as a first install. These should behave the same.

notation

For a, b git tree-ish values, always in a sensible order.

a ↝ b means "deploy a then deploy b"

a ↝ (b ↝ c) means "deploy a then deploy c"

(a ↝ b) ↝ c means "deploy a then deploy b then deploy c"

In general, l ↝ r means "deploy however l wants to, then deploy the final version from r"

There's an additional rule that says history should be forgettable:

l ↝ r = r

subsequence view

Given a sequence of versions, deploying a random subsequence (in order) including the final version should yield the same result as deploying any other random subsequence in order including the final version.

migration ordering

Commonly people label migrations with a date, and assume migrations will be applied in date order. That works naively but doesn't in the presence of branching when the date is the date the migration was created rather than the date the migration went into production/master. Maybe a merge requirement is that migrations are not added that would appear "before" an existing migration. Although if the migrations in both branches commute, this isn't a problem. (and I've not experienced it in practice...)