integration testing multiple repos - benclifford/text GitHub Wiki

Integration Testing Multiple Repos

Motivation

I'm trying to address two related situations here involving one project being built out of multiple git repositories, all under the control of the same group of people.

The motivation for this note comes from working on a project with multiple git repositories, where an "upstream" component regularly changed its API, causing the compile for a "downstream" component to now fail in CI, despite not having been changed since its last successful CI build. This led to a lot of finger pointing and frustration.

I've recently come to work on a couple of other projects which involve multiple repositories, and this is my attempt to resolve those frustrations.

My approach is to run full-project integration testing in every respository, so that a pull request which passes repo-local tests (such as unit tests) but which breaks a downstream component is marked as a CI failure.

From a social perspective, the important point is that this requires buy-in from the owners of every repository involved - a large part of the initial problem was that "upstream" components didn't seem very invested in not breaking their "downstream" components builds.

Techniques

I'm attempting to set up an environment which can be built in a CI test, with Github Actions being my immediate CI system; and can also be built in a developers environment with arbitrary changes to arbitrary repos, without needing to commit and push to a separate build system.

There are two broad stages:

Prepare the candidate sources trees all in a row next to each other.
Test those source trees, with all references to other components being pointed at those source trees.

1. Prepare the candidate source trees all in a row next to each other.

In a developer environment, this is pretty straightforward: have a src/ directory, and have each repository checked out into that, as src/foo/, src/bar/, etc. Likely those can be developers working directories in which arbitary local changes have been made.

In a CI integration environment, this is more complicated: it is easy to check out the repos all in a row, but less each to check out the right tags/branches.

In GitHub Actions, my choice of branches is that everything builds from the main branch except the repository in which the candidate PR lives, in which case the PR candidate is checked out.

While developing features across repos, it would also be reasonable for a branch to override the branches of other repos for testing.

2. Test those source trees

Test those checkout source trees. Those source trees will reference each other somehow (because they are integrated).

For example, in one project of my projects there are two ways in which source trees interact:

docker-compose builds Docker images from several source trees and connects them together via the filesystem and network connections.
some repositories provide python modules which can be installed into the python environments of other components. [I'm still messing around making this happen in my main project which wants to do this inside docker...]

It is important that every reference to another component comes from the local source tree, and not from (for example) a different version in an online image or package respository.

Implementing this in Github Actions

I've implemented this as an integration repository containing the definition of a custom Github Action, which is then called by each respository in its Github Actions workflow. (It is much more verbose boilerplate than I find personally tasteful)

In integrations/.github/actions/integ/action.yml,

name: 'Integrate components'
description: "Integration test of all components"
inputs:
  repo_a_commit:
    description: "branch/commit of component A to test"
    required: false
    default: 'main'
  repo_a_repository:
    description: "Repository of component A to test"
    required: false
    default: 'myproj/repo_a'
  repo_b_commit:
    description: "branch/commit of component B to test"
    required: false
    default: 'main'
  repo_b_repository:
    description: "Repository of component B to test"
    required: false
    default: 'myproj/repo_a'
runs:
  using: composite
  steps:
    - uses: actions/checkout@v2
      with:
        repository: ${{ inputs.repo_a_repository }}
        ref: ${{ inputs.repo_a_commit }}
        path: repo_a
    - uses: actions/checkout@v2
      with:
        repository: ${{ inputs.repo_b_repository }}
        ref: ${{ inputs.repo_b_commit }}
        path: repo_b
    - name: integrated build
      shell: bash
      run: |
        echo whatever the build and test stuff is here

Then, in each component repository, call this action, overriding only the relevant component repository. For example, in component A .github/workflows/ci.yaml:

jobs:
  integration-test:
    runs-on: ubuntu-20.04
    steps:
      - uses: myproj/integration/.github/actions/integ@main
        with:
          a_repository: ${{ github.repository }}
          a_commit: ${{ github.ref }}

This will run the integration test with everything coming from main, except the particular component A, which will point to the candidate reference (eg PR) under test.

Alternatives

I'm a fan of a small number of large repos (aka monorepos), and my general feeling is that if a project is so tightly coupled that it needs the techniques described in this note, then it is likely to be better off inside a monorepo. This can facilitate much more atomic commits across components.

All tests are integration tests

Not really true, but a large chunk of code that I've worked on needs to interact with other modules, even for really basic testing: either you need to mock that up (and be integrating with a different fake version of the mocked code) or integrate with the real thing.

My opinion mostly is that you should embrace that integration, rather than trying to work around it with mocks.

Conclusion

Please use a monorepo.