Driver: GitHub - adobe/aquarium-fish GitHub Wiki

GitHub gate driver

GitHub is the first gate driver that enables GitHub Actions and Workflows to work with Aquarium complex by providing self-hosted runners.

Features

  • REST API Pull way
  • Webhook Push way (TODO)
  • Automatic balancing of API rates (TODO)
  • Managing the stale runners (in case images works improperly)
  • Filtering by repo full name

Security

This gate was designed around GitHub API, but it has flaws with isolation - one of the important Aquarium features. That means you will need to open your workers to github.com (yes, even if you use enterprise). The workers as well as code checkout is affected - so you will have a hard time:

  • Controlling your dependencies by firewall, because that will open access to everything github has
  • Preventing leaking of your intellectual property - yep, anyone who have access to pipelines will be able to push your code to public repo and call it a day

In order to protect your IP and dependencies - you in theory could use some sort of MITM https proxy, but that will complicate the things quite a bit, so good luck!

Firewalls

You probably got an idea, that at least you can filter just access to github via CIDRs, right? That is not easy - because it's everywhere! There is a huge list and you will quickly find that even with consolidation it's getting well over 3000 entities (IP v4+v6).

But in general you will need to make sure your worker have access to the next services to be able to use all github features:

  • https://api.github.com:443 - general
  • ssh://github.com:22 - code checkout through SSH
  • https://github.com:443 - code checkout through HTTPS
  • https://codeload.github.com:443 - actions
  • https://*.actions.githubusercontent.com:443 - actions

All the current necessary CIDR's you can find in https://api.github.com/meta (most critical ones are "api" and "actions*").

Basics

How to use in GitHub Actions workflow

You just need to specify the label you want for the workflow like that:

jobs:
  test1:
    runs-on:
      - self-hosted
      - <AQUARIUM_LABEL_NAME>[:VERSION]

Version pinning is useful for releases to completely reproduce the environment, but for current main branch it's better to use the latest label (no need to specify VERSION).

There are 2 ways to serve github with requested self-hosted workers - through receiving webhook requests and via direct REST API checking. There are some pros and cons of each method, but they are really works best together - so if you can afford that I would recommend to use both.

  • Receiving webhook requests (Push) - you will need to organize public internet endpoint to receive the github.com requests and you can't verify if request is lost on the road, but that worth a shot because reaction to request will be immediate and can scale as much as you can imagine.
  • Periodic update (Pull) - requires direct access to github.com, auth on GitHub, limited by rate per request, causing delays, not really scalable. But it is easy to setup and quite reliable way.
  • Hybrid (Push+Pull) - when both ways starts to work together they are canceling weak points of each other. This way you can be sure you don't miss a request and reaction to adding new repo is as quick as possible.

If you starting and want to play with this gate without investing much in infrastructure - just choose the Poll way: simple Token auth and repo webhook will work well for you.

API Rate Budget

Using GitHub API you need to understand the requests budget - by default for Token and App it's 5000 requests/h, but for enterprise App it's 15000/h (you can read more in GitHub REST docs). So in order to properly utilize the given budget Gate calculates how much was spent:

  • Hooks receiving: for each repo that passed filter (2 requests per repo)
  • How many hooks Gate found and rolling amount for the last 24 hours of deliveries received

All those numbers allows the Gate to keep the deliveries checks interval as low as possible (down to the check limit that is by default is set to 30s). But be careful if you want to use alot of repositories - the requests budget could easily, so look at the log/monitoring to keep the budget under control.

If the Webhooks Push enabled as well - the Budget will be used much less, because Gate will skip the delivered events and only focus on undelivered ones.

Configuration

Github side

No matter what the way (or both) you choose - you need to create Webhook in Github for org or repository. In case you need to use Poll as well, please create Auth method to access Github API.

Choose Auth method

Please make sure you know the difference between auth methods, especially if you doing it for your company. Otherwise you will be needed to spend additional time on migration (which is possible, but takes quite a bit of time).

Fine-grained Token

Easiest way for testing, but could cause issues on a long run.

  1. Go to https://github.com/settings/personal-access-tokens
  2. Click "Generate new token" button
  3. Set whatever token name/description you like
  4. Resource owner is important - token it will allow access to hooks only within this organization
  5. In Repository permissions specify only the next ones:
    • Webhooks, Read only - it's needed to read webhooks via API
    • Administration, Read and Write - it's used to create and manage the self-hosted runners
  6. Copy and save the created token to put it later in your Fish github gate configuration
Github App

Github Apps have a number of benefits over the regular token - for example in enterprise you have improved API call restrictions - 15000 per hour instead of default 5000 for token.

Creating of an App is a big topic, but here we will cover the simplest one.

  1. You need to choose where you want to create an App:
  2. Then pick an App name and put some description in. If you plan to use a number of Fish clusters for different purposes - it's better to reflect that in the App name - for example add -Org-Dev suffix to show
  3. Choose Homepage URL - it's required, but we will not use it because app will be private
  4. Skip Callback URL - it's needed for users and we will be just one user of this App
  5. Deactivate webhook - we would not need it for our purposes
  6. In Repository/Organization permissions specify only Webhooks one.
  7. And select "Only on this account" in "Where can this GitHub App be installed?"
  8. Click "Create GitHub App"

After app was created - you need to copy it's Client ID (on the top) and generate private key:

  1. In Private keys click "Generate a private key" button
  2. Save the file somewhere and copy it's content

When it's done - it's time to install the app in the repos you want to read info from:

  1. Now in the same GitHub App settings pick "Install App" menu item
  2. Pick the account to use for installation and click "Install" button
  3. It will ask "How to install the App" - and you can choose all repos or just specific repos
  4. Picked the repos to restrict the App installation and click confirmation button
  5. After that on the installed App page in browser address bar you will see a numeric the App Installation ID - copy it to later put in Fish github gate config

Setup Webhook

Webhooks are used for 2 purposes: to store deliveries which are available through the api and to send push requests to github gate.

Pull-only way

Since webhook is not just a sender, but also a storage - it's relatively easy to use it both ways. So to properly setup the webhook you need to do the next steps:

  1. Go to your repository settings and click "Webhooks" menu item
  2. Click "Add webhook" button
  3. Specify Payload URL - please be careful here, you need to pick the safe not-existing endpoint here to accidentally not share your webhook information with hackers, who can create the domain you mentioning here. I would recommend to use https://_aquarium_fish_github_gate_ because it will be hard to create DNS record with something like that and it's descriptive.
  4. Set the Content type to "application/json"
  5. Create some long random secret (>32 chars), so it will be hard to remember by human and save it.
  6. Recommend to use SSL verification as additional way to protect your webhook
  7. From the events list we need just one - Workflow jobs, because it's responsible to notify about the requested or no more needed compute resource.
  8. After that click "Add webhook"
Push way

To receive push events to be delivered you need a clear communication path from github server to your Aquarium Fish node. In case you using github.com - you will need a public address or some sort of load balancer for your Fish cluster. With that comes a responsibility to protect your endpoint from all sorts of attacks so please be careful with this way of receiving the events.

With that said, and you have all the required components and setup the infrastructure - just follow the Pull-only way and set the Payload URL to the actual endpoint that pointed to the gate's bind_address (from Aquarium Fish side configuration) you already have.

Aquarium Fish side

Yeah you still need to configure the GitHub driver gate to make it work, because there is alot of variables that can't be predicted.

Webhook Push

TODO

You need just to define the binding_address to listen on a port, so it will stream data.

REST API Pull

In general the configuration is not that hard, especially since you already have all the necessary things after configuring GitHub side. So let's put it all in Aquarium config.yml:

drivers:
  gates:
    github/github.com_<YOUR_ORG>:
      api_app_id: <GITHUB_APP_ID>
      api_app_install_id: <GITHUB_INSTALLATION_ID>
      api_app_key: |
        <GITHUB_APP_PEM_PRIVATE_KEY>
      filters:
        <ORG>/<REPO>:  # Supports path wildcards
          webhook_secret: <GITHUB_WEBHOOK_SECRET>

Runner setup

In general it's not that hard to configure the runner - it just needs to be fed with url, token and labels that this runner will serve. Aquarium will supply the worker with metadata mechanism, so the runner startup script can execute the next steps:

# You need to download and unpack github runner for your os/arch and after that:
$ ./config.sh --unattended --ephemeral --no-default-labels --url "$GITHUB_RUNNER_URL" \
    --token "$GITHUB_RUNNER_REG_TOKEN" --name "$GITHUB_RUNNER_NAME" \
    --labels "$GITHUB_RUNNER_LABELS" --work "$GITHUB_RUNNER_WORKSPACE"
$ ./run.sh

Potentially you want to automate that and bake it in using Aquarium-Bait, so you don't need to deal with those commands anymore.

⚠️ **GitHub.com Fallback** ⚠️